With the number of internet users increasing each year drastically, the data produced is also increasing. Every single person who is an internet user adds up new data in the form of emails, videos, audios, bank transactions and many more forms that are stored on storage devices or databases.
This huge amount of data can be used to extract information through this data and that is only possible if there is a Data Scientist.
Data Science as the name suggests has something to do with Data. If we consider a daily example, Data can be treated as the ingredients and Data scientists as the chefs. Data acts as fuel and it drives the data scientists crazy. But it is not only limited to data a strong mathematical foundation in statistics and probability is required.
It was enough of introduction, but the main question is still the same that what this newly emerging field of Data Science all about. So, through this blog, I will try to give you all an introduction to the field of data science.
Data Science is all about the study of Data, with the help of statistics, probability and coding skills and providing an insight about the data which can be used by companies to give better customer satisfaction and hence increase their revenue. Data Science comprises 5 important steps i.e.
- Collecting Data.
- Storing Data.
- Processing Data.
- Describing Data.
- Modeling Data.
— Collecting data means if we want to answer a data science question the first thing that is required is the data. So, we need to collect the data.
There can be two conditions first, the data is readily available to us through databases second, the data has to be collected by venturing out and through surveys.
— Storing Data is all about storing the data so that it can be used later to answer some data science questions. Earlier this step was too hectic due to the unavailability of storage devices at a minimal cost. But with development in the field of electronics and computation resources (Cloud) storing data has become quite easier now.
With time there have arisen many new concepts some of them are
- Databases- The data is stored in a structured way in the form of tables.
- Data Warehouse- Multiple databases are integrated and act as a common repository.
- Data Lakes- In these the data are poured irrespective of its format into a lake of data. Example- Consider a donation box outside a temple people come and donate money some of them donate coins (Rs. 1,2,5,10) while some donate there offering in the form of notes(Rs. 10,20,50,100,200,500,2000) while some also donate golds, silver. So here donation box can be understood as a lake and the offerings as data. They together form the Data-lake.
— Processing Data- Depending on the Data science question asked the data which is best suited and which can answer the question has to be processed. This step includes cleaning of the data, scaling, normalizing, standardizing the data.
— Describing Data- The data can be used to describe many insights. Using data visualization and data summarising methods.
— Modeling Data- After all the hard work, the data segregated has to be sent for modeling it is an important step that helps in getting an approximation on how much our data can we give insight about a question.
Modeling data now is considered as
- Statistical Modelling- Here simple statistical models and hypotheses are used to explain the effectiveness of our model.
- Algorithmic Modeling or ML- Here the complex model is solved using algorithms instead of statistical modeling.
Data Science is a newly emerging field so there is a huge confusion among people about who is a data scientist or who is not. Anyone who is working in any of the 5 steps of data science as stated earlier can be considered as a Data Scientist.
In 2012, Harvard Business Review stated Data Scientist as one of the sexiest jobs of the 21st century. So this is the perfect time to start a career in Data Scientist.