Foundations of Data Science

Parveen Khurana
19 min readMar 1, 2020

Nowadays, all of us have heard that “Data is the new oil and the Data Science is the combustion engine that drives it” or “Data Science is the most sought after job of the twenty-first century” or “Data Science is the future”. In this article, we discuss what exactly is Data Science.

What is Data Science?

Before we start answering this question, it’s important to understand why there is so much confusion(something so popular) around it, why there is no clear understanding of what data science is.

One reason for this is that it’s an assortment of several tasks, there are different tasks involved in the Data Science pipeline and that’s why it's not clear who can call himself/herself as a Data Scientist and from application to application, the importance of these tasks changes. In some environments, organization a particular task might be important and in other applications/environments/organizations, some other task might be more important. So, there is this uneven distribution/importance of these different tasks and hence there is this confusion that if I’m doing this task or say these two tasks, is it still called Data Science or not? So, we try to clear out this confusion and start by asking the question: what are these different tasks involved in the Data Science pipeline or the different tasks that the Data Scientist should know.

The different tasks involved in the Data Science pipeline are:

  1. To collect the data.

--

--