Hadoop dev environment

Local dev environment

It is important to be able to test code on your own machine before submitting to a Hadoop cluster. Below are instructions for doing so in Eclipse.

The basic idea is to create a normal Java application but add all the Hadoop JARs to the classpath. Your own code will have the main() function, so you run it as a normal Java application.

The Hadoop libraries have been packaged, for your convenience, into a single ZIP file.

Windows installation

Linux/Mac OS X installation

For reference, I copied the JARs from these locations in a fresh Hadoop 2.6.0 distribution (from here, the non-source package), and added them to the ZIP.

Next, you can proceed to the Hadoop workflow notes.

delenn environment

delenn is a moderately-sized server (132 GB RAM, 32 CPU cores, 15 TB disk dedicated to Hadoop) running CentOS 6.4. It runs about 10 virtual machines to simulate a Hadoop cluster. A real Hadoop cluster should be made up of physical commodity hardware (normal servers, not supercomputers). But buying lots of servers is significantly more expensive than simulating them, so we simluate them. However, performance suffers in simulated environments. See the Hadoop notes for more details about simulation vs. real hardware.

Network diagram

See the Hadoop notes for definitions and explanations of these nodes. Below is a summary of their roles:

Node Link Purpose
resourcemanager Link Manages YARN “containers”, i.e., jobs
namenode Link Manages HDFS metadata
mrjobhistory Link Records finished MapReduce jobs

The three nodes identified above each run one daemon with the same name and purpose as described.

Each slave runs two daemons:

londo environment

londo has a local Hadoop installation as well. It is for testing purposes only, if you are unable to test on your own machine. It only runs in one thread, so it is quite slow. You would need to create a JAR package in your IDE and transfer it to londo, or compile your code on londo. Then execute the application as described in the Hadoop workflow.

CINF 401 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.