Tuesday, February 18, 2014

Big Data Analytics

Big Data Boat

I have always been amazed with astrology and future prediction. Big data[1] is same, future prediction with a scientific approach to it. Big data is a collection of data which is so huge, that it is difficult to store and process them using conventional methods. Examples are RFID (Radio frequency ID), data from social networking site like Face book, Twitter, Internet search, Credit card payments, transaction of retail giants like wall mart, etc. 

So how do we define big data. How to know if the data can be considered "Big Data". What are the attributes of Big Data.


4 V's of Big Data.[2]

a) VolumeVolume refers to the size of the data. It's estimated that 2.5 quintillion bytes (2.3 trillion gigabytes) of data are created every day. 

b) VarietyVariety refers to the type of data. This case be server logs, click stream data, audio and video.

c) VelocityRefers to the speed at which it is generated. We are generating data at a very fast pace. The New York Stock Exchange alone captures one terabyte of trade information during each session .

d) VeracityVeracity refers to the uncertainty of data. The noise and abnormality in the data. Is the data which we are analyzing related to the problem at hand?

Big data analytics is already being used drastically in the retail and other industries. Companies have now started moving, from an approach of identifying customer transactions to understanding customer interactions. Now companies are not only interested in knowing what was bought, when it was bought but also why it was rejected. Based on the users online interaction, they try to find why a particular item was selected or rejected, and is there a pattern in that.

You would have experienced while doing online shopping that you select a pair of shoes and then remove it. Next time when you visit the site, the same shoes are displayed. They try to find a pattern. To get to a correct conclusion we need to analyze huge data. Each click that you make online is being stored on the servers and being analyzed. This is not only done online. There are companies which are now following this approach in physical stores. Using video cameras, they try to capture the expression of the customers. Try to analyze, which part of the store is being used more, which time of the day, which day of the week, etc. This analysis is more powerful than the analysis based on a few clicks made by customer online.

This development is going to make a huge impact on all the walks of life and not only retail.

1) This can be used by industries to analyze risks and find out ways to avoid them in future - With Big data we have all the past records. So before taking any major decision, the past trends can be observed to reach to a least risk conclusion.

2) It becomes easy to get to the root cause of an issue - Since we have all the relevant data of servers, logs and other details; it is possible to exactly pin-point the issue.

3) Understand the upcoming changes in fashion - Based on data from social media, it will be possible to identify the new trends. The company can start manufacturing the products based on these data.

4) Identify the voting pattern and the issue that matters the most. As per Bosmol dated 8th Feb, Big data analysis was one of the important factor's for Barack Obama's re-election in 2012.

5) Based on big data, the companies can predict when to launch the sales and how much. It can help to  provide the shopping pattern of a particular location. The sales offer can be done at one location and not the other.

6) It can be used by the telecom companies to identify the dead spots in the signal coverage and also to identify the calling pattern. This is already being used by most of the telecom operators. Based on the calling patterns various schemes are given to the customer.

7) Big data can be used in the fields of astronomy, medical science - With big data, we can identify why particular medicine affects one person more than the other. What is the reason for a particular allergy; is it related to location, climate or the individual. Lot of unanswered questions would be answered.

8) Understanding the mental status of an individual - In the current scenario, people are very active on social media. This can be used to check the mental status of a person. Using this, untoward incidents can be avoided.

Big Data Challenges




The three key challenges in making data analytics work are

1) What data you want to use – The challenge is which data to use, how to get it, how to integrate it and how to use it. There’s lot of data out there. Supply chain data, customer data, and performance data. Managing this data is a big challenge, but the real value comes when we integrate internal data with other external data like weather data, traffic pattern data and competitive data.

2) Analytics – The second challenge is to get the right people at the job. To get people who really know how to use the latest mathematical technique is difficult.

3) Transform the business – The third and the most difficult of all is to use this data to transform the business. There’s no point in getting insights and not using them. We need to change how the business operate, how the managers work on a day to day basis.

Some of the other technological challenges in big data implementation is integrating existing data ware house with Hadoop, getting professionals who know how to write Map-reduce code in Hadoop. Since these challenges are now being resolved by using technologies like Impala(which can be used to directly query using SQL, no need to convert to map-reduce), sqoop let’s hope that Big data will be used to its full potential in the near future.

References :

[1] http://en.wikipedia.org/wiki/Big_data
[2] http://dashburst.com/infographic/big-data-volume-variety-velocity/
[3]http://inside-bigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/
[4] http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics
[5]http://www.mckinsey.com/insights/business_technology/making_data_analytics_work
[6]http://www.justinholman.com/2012/12/19/geography-big-data/
[7] http://en.wikipedia.org/wiki/Cloudera_Impala