Are AI and Data Science related?
In the previous article, we discussed the different tasks involved in the Data Science Pipeline. Discussion on Data Science wouldn’t be complete without mentioning the term ‘AI’ because both these terms have really caught the imagination of the media, industries, governments and also statements are being made about these two fields and it’s important to understand how they are related if at all they are related.
The confusion around is that some people think AI and Data Science(DS) are synonymous, they are just the same thing; some people think one is the subset of the other and some think that these are two different things and we don’t need to talk about them together. So, all these multiple theories are floating around and the confusion arises because when we look at the popularity of these terms in the media or say an article on the web around these terms, they use these terms in a very non-technical and broad sense, they don’t talk about these from a technical point of view, what does AI mean, what does classical AI involves, what does the early AI looks like, how does it intersect with Data Science if at all so they talk in terms of non-technical things, anything to do with data, anything to do with learning is AI or it could be data science also and sometimes all sets of theories get floated. So, this is the confusion that prevails.
Let’s make an attempt at defining AI:
The above one is not a very not good definition as it does not really call out what is involved in AI. Just like we defined Data Science in terms of tasks involved in Data Science, we will do the same for AI.
The following are tasks involved in a typical AI pipeline:
Traditionally classical AI has focused on problem-solving, knowledge representation, reasoning on top of this knowledge representation, decision making, and this loop of perception, communication, and actuation.
Let’s look at each of these tasks in detail.
Problem Solving — Let’s understand what is involved in problem-solving with the help of a game example. In the below image, a maze is there and we are starting(green dot) somewhere inside the maze or outside the maze(blue dot) and we want to reach some location that could be inside the maze or out of the maze from another side. And we are interested in building an AI agent which can play this game, other instance of this game is ‘Tic-tac-toe’ where we have some moves and agents can decide what to put an ‘O’ or ‘X’ at certain locations and keep going.
And other kinds of such games where the start position and the goal is well defined and we want the agent to reach a winning position or a victory state. There are several such games out there.
Now what happens in these goal-oriented games is that, we can think of it like this that at every step, the agent was starting from the green dot here, we can think of it as the agent has to take a decision at every point, it started from the green point, now it can go to the left or to the right and it continues to do this at every step at every turn at every corner and keep going ahead and as it keeps on doing this, at some point it might reach the goal stage.
We have just shown one path to reach the target but there could be multiple paths from source to reach the target, of course in some mazes there might be only one path in which case the goal would be to find this particular path from multiple possibilities most of which leads to a dead end. This is essentially a tree traversing problems, we need to find ways of efficiently traversing these trees, in some cases instead of just left, right, more actions may be available.
Here, there is no data involved as such, the agent is not learning from large amounts of data, we are not really showing the agent multiple instances of the game or anything which has happened in the past, there is no modelling also as we are not trying to say there are some underlined data distributions, there are some relations or anything of that sort. This is very different from what we discussed in Statistical modeling or Algorithmic modeling and the only thing needed here in such simple well-defined rule-based games is efficient to search algorithms and that’s why in classical AI we study A* or Depth First Search or Breadth-First search. So, this is part of AI which is not data-driven and relies on efficient algorithms and does not intersect with the Data Science world.
Knowledge Representation and Reasoning — Let’s explain this also with the help of an example.
Here is a slightly more complex game, think of it like a maze or a grid in which we start from somewhere and the goal is to pick up the gold which is somewhere inside this maze. But now the rules of the game are slightly complex for example say there is this rule that if there is a lion in the current cell then there is gold in the cell to its right. So, clearly now we need ways of encoding this information, how do we represent the information that if lion then gold in the adjacent cell, to do this we need some knowledge representation ideas which essentially is propositional logic or first-order logic as depicted in the below image
And the syntax for the first rule implies that, isLion(cell) -> isGold(right(cell))
means if the current cell is the lion cell then there is gold in the right cell. We have written it in simplified form over here, we would write it in some form, but this is how we would write it, this is one rule in the system
And now as the agent is traversing through this tree, at certain points, it needs to do reasoning, the reasoning it needs to do is as — at some point, it encountered a wind or it felt breeze(imagine the sensor is giving in that information) and now it has to run this inference rule that isWind implies that there is a pit and hence it needs to be careful in navigating the adjacent cells, that’s what it should do, so representing such complex rules requires us to use proportional logic or first-order logic and then do reasoning on top of that. So, this is something which we do in great detail in classical AI, again this is not data-driven there is no data involved in this, and knowledge representation and reasoning require good knowledge of propositional and first-order logic. So, this is again not a part of Data Science.
Decision Making — Traditionally in the 50’s and up to 70’s, 80’s, Expert Systems is what was used for Decision making. So, let’s look at what does an expert system means and in what context it can be used.
So, think of a situation where a doctor or a hospital is interested in automatically diagnosing whether a patient has dengue or not, the following two such set of rules that a doctor typically uses to perform this diagnosis
Of course, there could be other such combinations possible. Now, hasHighFever — we conclude this from another rule in our rule base which says that if the temperature is greater than 102 then it implies that the patient has high fever. So, the input that we get is just the temperature based on that we have to reason first whether the patient has high fever using that we again have to reason whether the patient has dengue(considering other inputs like whether the patient has a rash and has vomiting).
So, expert systems are just the embodiment of these rules.
Even today there are many systems which are expert system because these systems are very precise. If someone is working in a domain where the rules are so well defined we know the exact symptoms which leads to dengue then there is no need of ML, DL to learn from data when the rules are already known to an expert and we can clearly represent those rules using the Knowledge representation technique and that’s when we stick to Expert Systems.
There are some limitations of Expert systems:
Sometimes the rules are inexpressible for example say a doctor is diagnosing for jaundice and doctor can look at the yellowishness of the skin and say the patient is suffering from dengue but how do we represent this as a rule, how do we measure the yellowishness of the skin, that’s very hard to do. So, in cases like this one, rules are inexpressible. In some cases, rules are unknown for example for dengue the rules are known but for ebola the rules are unknown and this happens for new diseases as they come along, and sometimes are rules are very complex and humans can’t write them out in an expressible way.
Say we are interested in knowing whether a person has Ebola or not, we don’t know the rules, we just use all the data that we have and we also have the data for the past patients whether they were diagnosed with ebola or not, so, we have the previous data, data for the required attributes(like height, weight, blood pressure, fever and so on) for the patient, we give all this data to a machine, we try out a good model/function from the repository of functions available in machine learning and our aim is to find out this function/model parameters using the data, optimization techniques that we have.
So, if we want an AI agent to do decision making and for that, we want to learn from data that’s where we get into Machine Learning and the moment data and machine learning comes in, we are intersecting with the Data Science world. And that’s where again Deep Learning comes where if we have large amounts of high-dimensional data, very complex relationships, then we use Deep Learning. It’s just like having an AI agent to learn from large amounts of high-dimensional data having complex relationships and again this intersects with Data Science.
Reinforcement Learning — So, here the agent actually interacts with the environment, it tries to learn certain rules about the environment, it tries to learn how to best act in the environment so that it gets good rewards for its action, classic example here is chess which is a very deep sequential decision-making process, we make one move and it might have impact 20 moves down the line we don’t really know what the impact is. So, the AI agent has to learn from a very dynamic environment and its a very dynamic environment because the agent is not the only one in the environment, there might be an opponent also, he might choose to do something based on which our good move might become bad or our bad move might become good. So, we don’t know how dynamic that is going to be.
And this is Sequential Decision-making process unrolled over many steps, a typical chess game would last for many many steps, and we don’t know what we did at the 10th time step how it’s going to affect the 15th step, so we don’t have that information, we just have the partial information with us, only what we are doing immediately.
And there is no explicit supervision at each step, so when we are trying to teach an AI agent to learn chess, it’s impossible for every move to say whether this was a good move or not simply because even we don’t know, it depends on so many other factors, it depends on what the other guy does, it depends on how we back this move for example if we have a piece to a good position and the obvious thing is to have the protection for this piece which if we don’t do then a good move would turn into a bad move, so we don’t have the explicit supervision at each step, all we have is the on-off rewards at the end of the day, the agent has played the entire game, now it knows whether it won the game or lost the game and then based on that it can decide whether its actions were good or not, it can try to understand the importance/goodness of each of the actions and learn accordingly to interact with the environment in the future. So, typically what we do is to run a large number of simulations where the AI agent keeps playing, it wins games, it loses games and every time it loses or wins, it propagates back this information to all the moves that it had taken, that this set of actions lead to a bad outcome, this set of actions lead to good actions and that’s how it learns, given a particular state of the board what is the best move to take, that’s what Reinforcement Learning is all about. This is again data-driven because we need to learn a lot from examples of previous games and try to figure out what are the best actions and so on.
Perception, Communication, and Actuation
Traditionally, AI agents were required to communicate with humans, they were supposed to be in the service of humans which requires them to communicate with humans and the default choice for communication is of course language.
So, now if we want our AI agent to communicate using human language, we need it to have “Natural Language Understanding” as well as the ability to generate language, it should not only understand what a human is saying but also be able to generate its own sentences and reply back. This comes under Natural Language Processing(NLP) and NLP has evolved over the years and earlier in the 1950s the expectations were limited, all we wanted the agent was to do very templated conversations with a human, for example, if someone says I want to book a ticket, it’s very clear we are not going to talk about politics, philosophy or movies or anything out of that domain, when we want to book a ticket all we want to talk about is Source destination, Target destination, time, and preferences of seat and more of travel and so on. So, that’s a very very limited domain, conversations were very templated and hence they could be implemented using Expert Systems.
Then again around the 1980s as Machine Learning became popular, in the 1990s and early 2000s, when machine learning was a default paradigm, in particular, probabilistic graphical models were largely used for enabling Natural Language Processing, these were again data-driven that means if you want to build for example a sentiment analyzer, we show it some thousands of examples of sentences which had a positive sentiment and negative sentiment and the machine can learn and tell for the given sentence if it conveys a positive sentiment or a negative sentiment. We can think of the Input sentence as “x” and the output sentiment could be 0/1 and can be represented by “y” and we want to learn a function that maps the input to the output.
Since the 2010s, Deep Learning has become the default choice for Natural Language Processing which again depends on data.
So, today NLP is completely data-driven and relies on Deep Learning largely and a bit of Machine Learning also, so again, this part of AI which requires an agent to be able to communicate and hence have NLP capabilities intersects with Data Science because this now has become a completely data-driven field as opposed to say 30–40 years back where it was largely rule-based and was limited to very templated conversations or templated activities with language.
Perception is for an agent to perceive things in its environment, perception clearly means the ability to be able to see because as humans we interact a lot with the environment by seeing or observing things and the other thing that we see for perception is speech that means we can hear things and we know what others are saying or what is happening around us and use that to act or interact with the environment. Now, perception using vision means we need to be able to give the ability to the AI agent to have a vision and this is an area of computer vision where we focus on things like training an agent to classify different objects, the other thing involved with computer vision is to track an object say a human and this is important in various applications like surveillance and so on. So, anything to do with the vision perception is what falls under the computer vision and similarly anything to do with the speech recognition(agent should be able to listen to commands and process them; understanding would be on natural language understanding part but just to be able to recognize speech that’s where speech technology comes in). Again today, Computer Vision and Speech is totally data-driven and intersects with Data Science.
Actuation is in the context of Physical Robots, so a robot is an embodiment of an AI agent and this has largely to do with Robotics and also has a bit to do with Reinforcement Learning for example when a robot wants to navigate a space, it has to something similar to Chess where it has to make certain moves and the reward comes only at the end whether it reached the state or not, whether it failed somewhere in between and so on. So, earlier there were rule-based things doing this or enabling actuation in a robot but increasingly this has also become data-driven where robots can now learn to perform complex actuations by learning from simulations or by mimicking human actions and this, of course, intersects with Data Science.
So, in summary, if we want communication using Natural Language Processing or we want Perception in the form of Computer Vision or Speech Technology or a part of Robotics dealing with Actuation, all of these is today data-driven and this is where AI intersects with the world of Data Science.
Today, the task of decision making, perception, communication, and actuation is largely data-driven and this part of AI intersects with the world of Data Science.
To see how it intersects with the Data Science world, let’s take an example o Natural Language Processing — Today if we want to do NLP, we need to collect data, we need a large amount of data to build sentiment analyzers or to build translation systems or to build summarization systems, all of these are NLP tasks, we need to be able to store it, to process it, processing requires cleaning of data say if there are spelling errors what do we do, if its written in chat-lingo how do we normalize it and so on, all these come under processing the data, and then we might also want to describe data say if we are looking at sentiment data, we want to know what are the most frequent words associated with positive sentiment and what is their frequency for example if words like amazing, awesome, beautiful and so on or boring, horrible, not watchable and things like that which might be associated with positive or negative sentiment, we want to get a description of the data, we might also want to know the associations between words, how often do we see doctor and patient together or cat and dog together and so on, so we want to draw certain scatter plots between these words and finally we want to build this model, we have the input as a text document or text sentence given to us and we want an output which would be a label in the simplest case say positive sentiment or negative sentiment, we want to learn functions which relates the output ‘y’ to the input ‘x’ and the way to go about this is to propose a machine learning or deep learning model and estimate the parameters of the model using large amounts of data. The same pipeline can be used for computer vision, speech technology as well.
We can not say that AI and Data Science are synonyms as there are some tasks like Problem Solving which comes under AI but does not intersect with the data science world, we can not say that one is a subset of the other, at best, we can say that there is some intersection between AI and Data Science.
That’s how AI is related to Data Science and how it is related to Machine Learning and Deep Learning also.