Hadoop Reference, January 2011

This page is a reference for using Hadoop on the Mist and Helios clusters as of January, 2011.

Path Information
Users can connect to either Mist or Helios via SSH: # Connect to the Admin nodes with X-forwarding $ ssh -X mist.public.stolaf.edu $ ssh -X helios.public.stolaf.edu It is useful to know where Hadoop is installed - on both machines, it is located in /usr/lib/hadoop

Hadoop File System Commands
Hadoop 0.20.1's command is  and it supports the following commands (taken from the usage): Please note that while HDFS commands correspond directly to shell commands, they do not behave in exactly the same way. For instance, to remove recursively you must use rmr instead of rm -r, as the HDFS rm does not support the -r flag. Also, since HDFS commands send output to the local shell, output redirection works correctly. For example: hadoop fs -cat /home/user/garrity/file.txt > ~/file.txt

Hadoop in Java
Hadoop is written in Java, and as such Hadoop jobs are normally written in Java using the Hadoop API.

Word Count in Java
See MapReduceIntroLab

Hadoop in C++ (Helios only)
Hadoop has an extension called Pipes which allows users to write jobs using C++ (without using streaming).

Example Program: Word Count
The following code counts the words in a set of input.

Building and Running Pipes Jobs
There is an issue building pipes jobs without the use of libtool -- without it, linking always throws an undefined reference error. After building, the executable must be copied to the HDFS: Finally, the job may be run:

Configuring Pipes Jobs
Hadoop pipes uses XML configuration files rather than a JobConf object (see: Java). A basic config file looks like this:

Collecting Job Output
Job output is placed in 'part' files in the specified output directory. This can be retrieved using either the cat or get command:

Example Program: Using a Combiner
This example extends the original word count example by using a combiner in the mapper. Rather than creating a combiner class, a hash table is used to operate a combiner in the map class.