Now we are living in the world of data. What is data? We all know that, known facts are called as data. Knowingly or unknowingly, every one of us is contributing alot of data to this world. Eg: when you do a transaction, you are generating some data. When you click on like button for your favourite post in facebook, you gain generating a data or if you are simply reading an online document or you stream a song, you generating another data. So data is everywhere and it says that, in 2020 around 1.7MB of data will be created in every second by every person on earth.
Why data is important? Why are we concerned about these data and where it can be used? Let us discuss one simple example. Suppose a bank want to provide negotiated rate for some privileged customers. Thereby the bank wants to attract new high profiled customers. The bank might have 1000+ customers. Out of these customers the bank wants to flag some customers as privileged customer. We cannot judge a customer by simply looking at his current months transaction record, many other factor needs to considered like his past transaction record, mostly used service of the bank, occupation of the customer, his present financial condition etc. Most of these information will be directly available in the banks database itself and some data might not be directly available in the banks database like his present financial status. Such data needs to be collected from the external world (Eg : May be by asking different people or collecting data from social media)
Suppose the bank assigned this work to you. Now you want to recognize say 10 privileged customers from 1000+ customers of the bank. What all activities will you do to achieve this? :
- Obviously, you should collect the data of all customer from all the possible resources Eg : Bank’s database, Social media etc
- Now you have to store this data somewhere for analysis purpose
- Suppose you have collected the average account balance of the customer as one of the measuring parameter. Usually this will be a large figure. So if you are using it directly, the analysis will be difficult. So we might need to normalize such data to a lower range. So some kind of processing needs to be done before data analysis
- Now the data is ready for analysis. Suppose you have collected say 10 parameters for each customer for analysis. Is it possible to arrive at a conclusion bys simply looking into these data. Obviously No!. You may have to generate a graph or apply some statistical technique(Like computing mean, median variance etc) to describe these data
- Now you can use this to predict the future activities or the number of transaction that can be done by the customer. Based on this prediction you can find out the most privileged/outstandingly performing customers of the bank. This step is known as modelling
The above activities are called as the different tasks of data science. i.e., Study of data is known as data science, which deals with the data collection, data storing, processing of data, describing data and finally modelling of the data.