Hadoop is a big buzz in the IT world these days. Apache™ Hadoop® is an open source

software framework for processing huge volumes (BIG DATA) of distributed data (HADOOP DISTRIBUTED FILE SYSTEM) using distributed processing capabilities (MAP REDUCE). 



The framework supports distributed processing of large datasets distributed across clusters

of computers (COMMODITY HARDWARE) using simple programming models.


Following are the key features which stands apart HADOOP in the field of large scale

computing with respect to other frameworks


SCALABILITY is a great feature the framework offers, it is designed to scale from few

servers to 1000 of servers. The nodes can be added as needed without impacting the

underlying applications. 

RELIABILITY The framework offers high degree of fault tolerance. It does not rely on the hardware to provide high availability, instead it has mechanism to detect and handle the failures.

COST EFFICIENT - The framework uses commodity hardware to provide distributed processing thus it results in huge cost saving in terms of dollars spent for processing per terabyte.

FLEXIBILITY - The Hadoop framework does not work on the principles of structured data as done in the relational database and thus can process any type of data whether structured or unstructured




HADOOP framework was developed by Dough Cutting and Mike Cafarella in the year 2005. Dough named HADOOP after his son’s toy elephant.




The base framework of APACHE HADOOP is comprised of the following four components


HADOOP COMMON- The common utilities and libraries required by other HADOOP components


HADOOP FILE DISTRIBUTED SYSTEM- HDFS is a distributed file system for storing large datasets on clusters of commodity hardware


YARN - YET ANOTHER RESOURCE NEGOTIATOR as the name suggests it is the resource manager for HADOOP. It basically managers the resources in clusters and takes care of scheduling of application processing on the clusters


MAP REDUCE - A simple programming model for processing large volumes of data



More about HADOOP

Hadoop provides a solution for storing and analysing the enormous amount of data generated these days by the digital world. So basically it is an answer to the problem BIG DATA poses to the digital world. If you want to know what big data is then read the article on BIG DATA here.

To get a hands on to the hadoop environment you can go through the article where we talk about setting up single node hadoop cluster.

Stay Tuned for some more Hadoop related tutorials ....

Click the +1 button  below to share it with your friends and colleagues


Share this if you liked it!