Data Science and its sub topics

Data Science and its sub topics

What is data science

We are living in a data rich world .In olden days, data were stored in tables and logs format.But today we are provided with many advanced features in storing and processing data with efficient hardware and software.By using this data what can we do will be a question .Data Science comes like an answer for it. Data science is generally exploring the large volume of data to obtain some interesting and profitable insights.Many companies and organizations are making huge profits using data science.The data science has many branches and one deploying one or more its branches can become a data scientist

Branches in data science are as follows :

  • Collecting

  • Storing

  • Processing

  • Describing

  • Modelling

Collecting :

Data can be collected from various resources.One of major resource used in olden days is relational databases.It is structured data.Now a days we can collect data from social media by calling APIs to provide some data.Mainly this kind of approach is done during the election time for predicting the winning parties.Another resource is we can generate data for example in agriculture we can prepare huge number of experiments on farm lands to collect data for understanding how to achieve huge profit with available resources like water,sun and land.


We can store data in relational databases,data warehouses where the data will be stored from multiple data bases.There are many semi structured data which can stored in JSON formats.In past decade,the new way of data storing is introduced which is through data lakes where we load all sorts of data.It is not preferable to store data because time consumption occurs while choosing data that will be helpful for analytical purpose and in providing more insights.


Data processing involves major part.There will be transformation of data from one format to another which is known as data wrangling.We have to clean data for removing some inconsistencies and missing values and normalize and standardize data for better understanding.


Describing data is also known as data visualization.It will be generally used to plot the results in bar graphs,scatter plots etc.Summarizing the data will be a part in describing data.


Modelling will be divided into two parts like statistical and algorithmic modelling.Data Science mainly suites with statistical modelling and algorithmic modelling requires machine learning and deep learning.For example in an equation y=mx+c where x is independent variable and y is dependent only on x.Here statistical modelling is enough for predicting y value with x .If there is a huge relationship on data set for example y is dependent on more number of x then predicting y will be tedious process thus algorithmic modelling comes into the picture.Modelling of data generally refers to data inference.