Data Science - Explained

Data Science - Explained
                      **What is Data Science**

        “Without data you're just another person with an opinion”
                                                   Edwards Deming

Data science went from being jargon to something you hear perpetually in everyday conversations in almost a blink of an eye. So what is data science? Well here comes the catch, you ask this question to someone and it’s very likely that their response could be completely wrong or at least not very accurate and the question still stands tall. Well first let’s get this fact out of the way that there is no one complete definition for data science. Instead how we could answer this question is by summarising the process involved and the outcome expected.

Data science is the science of collecting, storing, processing, describing and modelling data with the goal of discovering patterns in the data and making inferences based on it.
It could be said that data science is the assortment of various tasks which require different skills to work upon and gives us insights about whatever the data scientist is looking for.

Now let’s dive into individual tasks involved.

  1. Data collection - data collection depends upon the question the data scientist wants to ask and what are the conditions in which she/he is working in. When the data scientist has clear answers to these two questions the he/she can proceed forward and actually collect data through the means of accessing data available in the organisation using code, scrap/crawl data available on the internet and/or design experiments to collect data manually.

  2. Storing data - the data collected in the first step needs to be stored appropriately in order to be used for further steps. Data is stored in relational databases, data lakes, and data warehouses.

  3. Processing - processing data involves three major steps
    a. Data munging - it involves transforming data from different sources and formats into a standard format.
    b. Data cleaning - it involves filling missing values, correcting spelling errors and removing outliers which distort the data
    c. Scaling, normalising, standardising - this involves scaling all data to same units of measurement, transforming mean and variance values to 0 and 1 respectively and and converting all ranges to be between 0 and 1.

  4. Describing data - describing data involves visualising and summarising data. Visuals paint a better picture and make it easier for the layman to identify patterns in the data and summarising data gives compact and robust answers to various questions about things such as sales, growth etc.

5.Modelling - data modelling is majorly done in 2 ways
a. Statistical modelling - it comprises of coming up with statistical models which fit the data and reveal underlying relations in data and also facilitate formulation and testing of hypothesis and and be robust enough to give statistical guarantees.
b.Algorithmic modelling - it enables the data scientist to fit models to complex and high dimensional databases using machine learning, deep learning etc.

Alongside all of the above tasks there is one more thing a data scientist should be adept in that is communication as the inferences he makes from the data would be of no use to anyone else if she/he cannot communicate effectively.

So this is what data science is according to me and i hope you understood what data science is after reading my blog.