Data Science - A brief story of its evolution

About this article:

This post is intended to be a peek into the past, present, and future of the field of data science and can be a starting point for someone completely new to the field. Curious readers can make use of some resources listed in the article for further learning.

The past
The origin of the field of data science has been widely credited to the seminal 1962 paper by the American mathematician, John Tukey [1]. This paper acted as a roadmap for the development of data analysis and is relevant even today. In this work, Tukey identifies the needs for expanding the well-established field of statistics into the field of data analysis. He even argues why data analysis was a new scientific field, driven by theories of statistics, advances in technology, and large amounts of data. Other scientists, like John Chambers in his 1993 paper [2], also accepted the need for expansion into larger domains. In 2001, Cleveland recommended an action plan for universities to delve into data analysis [3]. Breiman also wrote a paper in the same year about how statisticians must focus more on predictive modelling, which he identifies as the epicentre of machine learning [4].

In addition to these, many other influential voices exist in the field of data science. Some distinguish it from statistics based on the size of the data or the skill sets required to tackle a problem. Some say it is a rebranding of statistics. Many of these views are assorted in the seminal 2015 paper by Donoho [5]. This paper is an excellent and easy read for anyone interested in the evolution of data science. One can also refer to [6] to learn about other important milestones in this field, such as Hadoop, data visualization libraries, Spark, and data fabrics.

The present
The term ‘Data Scientist’ was coined by D.J.Patil and Jeff Hammerbacher in 2008 to represent skilled professionals who make discoveries in the field of big data [7]. These professionals are typically a hybrid of data hacker, analyst, communicator, and advisor [7]. Good data scientists must apply design thinking, handle workflows, negotiate human relationships, apply statistical methods, and tell stories using data [8]. To achieve this, they must master the skills for the data science life cycle in fig. 1.

image

They must also interact with many teams in the company and their stakeholders to identify the problem, collect data from relevant sources, extract unbiased knowledge from the data, and enable data-driven decision making. Currently, data science is the most sought-after career path, with a significant rise in yearly demand and a shortage of skilled professionals [9]. To better understand the profession, the reader can refer to the link in [9] as a starting point.

The future
The future of data science looks promising and brings new innovations in algorithm developments, workflows, etc. Donoho forecasts some trends such as growth in open source code and data sharing, adoption, and experimentation with new task frameworks etc [5]. Other major trends are augmented analytics and data management, conversational analytics, data fabric, and continuous intelligence [10]. On a different note, interactive and drag-and-drop analytics platforms such as Northstar developed by MIT are also gaining attention and can help non-specialists tackle data science problems.

Useful resources:
To keep track of future trends, readers can follow some useful websites as listed below:

arXiv - Data Analysis, Statistics and Probability

towardsdatascience

paperswithcode

References

[1] Tukey, J. W. (1962). “The future of data analysis,” The annals of mathematical statistics , 33(1), 1-67.

[2] Chambers, J. M. (1993), “Greater or Lesser Statistics: A Choice for Future Research,” Statistics and Computing , 3, 182–184.

[3] Cleveland, W. S. (2001), “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,” International Statistical Review , 69, 21–26.

[4] Breiman, L. (2001), “StatisticalModeling: the Two Cultures,” Statistical Science , 16, 199–231.

[5] David Donoho (2017) “50 Years of Data Science,” Journal of Computational and Graphical Statistics , 26:4, 745-766.

[6] towardsdatascience - Building the Future of Data Science by Favio Vázquez.

[7] Davenport, T. H., & Patil, D. (2012). Data scientist: the sexiest job of the 21st century. Harvard Business Review , 90 , 70–77.

[8] Simply statistics - The tent poles of data science.

[9] datascience@berkeley - what is data science?

[10] Gartner article - top 10 data analytics trends.