Data science is the science by which we can extract knowledge and insights from structured and unstructured data using scientific processes, methods, algorithms . With so much of data around it is much easier to understand certain trends to help the transformation of our lives. It is not just for big companies to know their customer needs or to access their productivity but it can be used for all the diverse fields that we can just think of.
The data generated can either be Structured (having a well-define structure or in tabulated form), Unstructured (XML files) or Semi-unstructured (excel files)
Data Science can be divided into 5 basic tasks.
In simple words, Data Science is the science of collecting, storing, processing, describing and modelling data.
COLLECTING DATA :-
Whats involved in collecting data ?
(i) Depends on question Data Scientist is trying to answer.
(ii) Depends on environment in which data scientist is working.
Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes.
STORING DATA :-
Data Storing in a data science process refers to storing of useful data which you may use in your data science process to dig the actionable insights out of it. Data Storing in data science itself is an orderly process which needs many things to be kept in consideration before jumping to more advanced or fancy things.
Data can be of these 3 types :-
(i) Structured : e.g. Patient records, Insurance Claims, Employee records, Telephone Bills.
(ii)Unstructured Data : e.g. Bank Accounts, Investments, Credit Cards.
Data processing occurs when data is collected and translated into usable information. Usually performed by a data scientist or team of data scientists , it is important for data processing to be done correctly as not to negatively affect the end product.
Steps used in Processing Data :-
(i) Data Wrangling / Data Munging :-
It is the process of transforming and mapping data from one “raw” data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.
(ii) Data Cleaning :-
This process consists of filling missing values, Standardize Keyword Tags, Correct Spelling Errors, Identify and Remove Outliers (Fields that are not in range).
DESCRIBING DATA :-
Describing Data is an important step in any analysis is to describe the data by using descriptive and graphic methods.
Visualizing data by using charts and graphs so as to understand trends, outliers and patterns in data. Summarizing Data by using Descriptive Statistics, which includes mean, median, mode, variance and standard deviation.
Data modelling is the process of creating a data model for the data to be stored in a Database. This data model is a conceptual representation of Data objects, the associations between different data objects and the rules.
Data model emphasizes on what data is needed and how it should be organized instead of what operations need to be performed on the data. Data Model is like architect’s building plan which helps to build a conceptual model and set the relationship between data items.
The two types of Data Models techniques are
- Entity Relationship (E-R) Model
- UML (Unified Modelling Language)
SO the conclusion to draw out from this knowledge is, When we have all data online it will be great for humanity. It is a prerequisite to solving many problems that humankind faces.