Monday, February 17, 2014

Beyond Big Data

            A simple google search on Big Data will list out a bunch of articles that one can read and instantly become aware of what exactly Big Data means. This very Digital Analytics blog has several comprehensive posts on it [1] [2] [3]. We all understand (at the least) the definition of Big Data and the different tools (NoSQL database systems, Hadoop, etc) used to mine Big Data. But where does the use of Big Data start and end? What is the purpose of collecting such huge amounts of data? What is the end goal we are looking at? My intention in this blog post is to explore beyond the mere buzz of Big Data, its applications in the domain of analysis. For everyone this might be food for thought.

               I am a big fan of TED talks. As weird as it may sound, I watch them to kill time, but at the same time learn something. There are of course thousands of talks, but I would like to summarize a few that truly inspired me to think about the different analysis that can be performed using Big Data.

Context is Queen

               The very first one is about the birth of a word by Deb Roy. I had watched this video two years ago, but it stayed with me for some reason. At first it sounds like a topic related to Linguistics and Human Language Development, but applying the model to social media completely changes the perspective. 

               When people start communicating via different social streams about an event, the creation of new social structures can be mapped from data – Big Data. Aggregating data from different channels across the web like photos, videos, audio recordings, tweets, blogs, pod casts, emails, etc. is a challenge here, but still simple to solve. Deb and his colleagues have taken this research further and applied it to Social TV programs and commercials. A social conversation that begins expanding on the web is tied to its stimulus – a TV show. Their company BluefinLabs [4] focuses on adding context to social commentary and has given birth to a whole new world of Social TV Analytics. When a human being browses through such real-time social commentary, it is easier for the human brain to apply context. But since we rely on machines for Big Data analysis, the biggest challenge Bluefin Labs has been able to solve is teaching machines to link language to context through language grounding techniques [5] [6]. Without this context, mining of Big Data and performing analysis over it would be useless. This is exactly what we have learnt in our Digital Analytics class – Context is Queen.


               The next talk is by Jean-Baptiste Michel and Erez Lieberman Aiden. It is about digitization of all the books published through time, to transform data into understanding our language, history and culture. 

               Again all this data is Big Data and the transformation process is called extracting information and to add context to it is called Analytics. Google Labs’ NGramViewer is a treat. Try this out – type in the word happy and observe the trend. Why would people be less happier over time? This trend requires some context. Then try typing in synonyms of happy and observe the trend. See what happened after 1980 for one of the synonyms - gay? We can easily add context here and give an explanation. Also try the American slang words dorky and freak. What I would like to think about is how this concept could be used in the future? If we could have charts that add demographic segments to synonym comparisons, we might be able to see word origins, depth of usage and more. One point of debate though would be the degree of difference, if any, between spoken and written language, which could give us an accuracy measure on the history of a language.


               The last talk is by a data scientist named Dan Berkenstock. His perspective on Big Data is satellite imagery collection and the analytics that can be applied over it. 

                We enjoy the convenience of sitting at home and using Google Maps to see a street view in a country miles away from us. But have you realized that these images from Google are not up-to-date? Try looking at our business building in street view; it is still under construction. Dan’s research effort is to now to collect satellite images of the earth in real-time. Moreover, the little bits of information that show real-time data related to the image you are looking at is phenomenal. He says that their team wants to use satellite imagery to “apply scalable analytics to find insights”.

So, what is Big Data?

               By now I hope you have realized that as potential job seekers we need to understand that collection of data is only the beginning. Big Data is pointless without context, analysis and drawing insights. Those are our end goals. Remember, Big Data is just large amounts of data, but what we should care about is what the different analysis we can apply over it are? There is unlimited potential for Big Data Analytics in every field you can think of today. From all the above inspirational talks I would like to say – You are dealing with Big Data, so think Big!