Hadoop Streaming

Hadoop streaming is a generic API (Applicaiton Programming Interface) which makes it easier for programmers to write and run map reduce jobs with any kind of executables. Map programs and Reduce programs receive input data in STDIN and output data in STDOUT in the form of key value pairs. In general map reduce programs receive data and produces output in the form of key value pairs. In case of Hadoop streaming also the data is received and produced in the form or key value pairs delimited by tab. Streaming API uses tab character to delimit the key and value pairs.  The generalization provided by Hadoop streaming API makes it possible for programmers to code map reduce jobs in any language which can use the stdin and stdout format. The Map Reduce programming model used by Apache Hadoop framework uses the standard JAVA MapReduce API and the Hadoop streaming API provides the flexibility to the programmers to write Map reduce programs in languages other than JAVA. To summarize in simple terms Hadoop streaming allows programmers to write Mapreduce programs in any language which can process STDIN and STDOUT.