Tuesday, February 18, 2014

Hadoop What is it?



Hadoop What is it?


Hadoop
Hadoop has become a buzzword in the tech world of today.  If you are like me you probably know that Hadoop deals with Big Data analytics and that  is about all that you know.  That broad definition doesn’t help an individual understand the true nature of Hadoop and what it can do.

A better understanding of Hadoop comes from looking at a high level but going deeper that just saying Big Data.  The best way to view Hadoop in my opinion is to look at it as taking normal analytics and virtualizing compute power in order to crunch vast amounts of data.  According the hadoop.apache.org, the official site, Hadoop is a “framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.” [1] 
 Or in other words it is scalable data-crunching monster that will analyze data that before tools like this was too large or too complex for conventional tools to analyze.


Hadoop, Big Data, Parallel processing  Hadoop can do this by levering the power of massive parallel processing  by using normal servers  and pooling their resources.  This model  is much more cost effective and easier to implement than when compared to buying specialized high performance and extremely expensive  servers. 


 Another great definition which sums up what Hadoop is comes from Mike Olson, CEO of Cloudera Hadoop is “Hadoop platform was designed to solve problems where you have a lot of data — perhaps a mixture of complex and structured data — and it doesn’t fit nicely into tables.”[2]

Now that we know more of what Hadoop is, it is important to understand where it came from. Hadoop’s underlying technology came from Google when Google first started to look at Big Data analytics.  When Google started there were no tools in place to analyze such vast quantities of data so Google built their own platform.  As demand and need materialized an open source project called Nutch came into being.  Taking the torch from Nutch and with massive help from Yahoo Hadoop was born for enterprise applications.

Hadoop, Big Data

Now that we know where it came from it is important to view the target market’s for Hadoop .  Hadoop still is in its infancy as far as a cookie cutter deployment model for different businesses.  Right now Hadoop is wide open for custom development and application development.  This means that companies can take their own data analytic tools and have them talk to the Hadoop cluster.   Already many fortune 500 companies are taking advantage of Hadoop with names like Adobe and Amazon among them.  Big Data will not be as large as Hadoop can take very difficult and complex sets of data and help to churn out answers such as modeling accurate portfolio evaluation and complex risk analyses for the world of finance.

In the future I would expect Hadoop to become more standard and standard out of the box applications to be developed for sale so that Application architects aren’t need to create productive Hadoop for every single company. [3] The amount of unstructured data that Hadoop’s tools can process is colossal and right now so much data isn't being leveraged.  Analysis of this data could lead to  higher conversion rates and boosts in sales.  As such Hadoop and what it works to do will become industry standard by my prediction in the next coming decades. Right now is the time to get involved and be the early adopter and not the company lagging behind looking in from the outside.





[1] What is Apache Hadoop? Retrieved February 17 2014, from http://hadoop.apache.org/
[2] Hadoop: What it is, how it works, and what it can do. January 12, 2011, retrieved February 16, 2014 from: http://strata.oreilly.com/2011/01/what-is-hadoop.html
[3] Hadoop for Dummies. 2012 retreived February 16, 2014, from http://public.dhe.ibm.com/common/ssi/ecm/en/dcm03002usen/DCM03002USEN.PDF