Information Technology and software development has always been a rapidly involving field with new: programming languages, development tools, frameworks, platforms and methodologies constantly being introduced, and while this continues to be the case today, there is one sector that has seen rapid change and innovation at a break neck speed in the past 10 years, that area is Big Data. In the last 10 years Hadoop has become the defacto framework for processing Big Data to the point that “Hadoop” and “Big Data” have become synonymous. Hadoop has yielded tremendous value to large web companies such has Facebook, Twitter, Linkedin, and Netflix just to name a few by allowing these companies to gain insight and value from the large data sets which they gather from their users.
Now businesses everywhere are trying to do the same by leveraging Hadoop to gain business insights and value from their own data sets. The Apache Open source licence of Hadoop along with the fact that the software runs on affordable commodity servers has significantly helped in the growth of the platform. The core components of Hadoop consist of YARN, HDFS, and Map Reduce.