2013-08-26

Hadoop 2.1-beta: the elephants have come out to play

After many, many months, Hadoop 2.1 Beta is ready to play with

Tanzania

From Arun's announcement:  Users are encouraged to immediately move to hadoop-2.1.0-beta

Some other aspects of the announcement (with my comments in italic)
  • API & protocol stabilization for both HDFS & YARN:
    Protobuf-format payloads with an API designed to be forward compatible; SASL auth.
  • Binary Compatibility for MapReduce applications built on hadoop-1.x
    This is considered critical -test now and complain if there are problems.
  • Support for running Hadoop on Microsoft Windows
    Irrespective of your view of what server OS/cloud platform to run on, this means all Windows desktops can now talk HDFS & be used for Hadoop-dev.
  • HDFS Snapshots
    Lots of work by my colleagues here, especially Nicholas Tsz Wo: you can take a snapshot of directories and recover from your mistakes later. (JIRA, details).
  • Substantial amount of integration testing with rest of projects in the ecosystem
    That includes the trauma of switching from protobuf 2.4 to 2.5 at the behest of the HBase team, something that led to seven days of trauma earlier this month as we had to do a lock-step migration of the entire cross-project codebase. Credit to all here.
What big things of mine are in there?

Primarily  YARN-117: hardening the YARN service model for better resilience to failure and reliable subclassing. There's certainly more than a hint of my old HADOOP-3628 work in there, which is itself somewhat related to SmartFrog -but the YARN service model itself has some interesting aspects that I can't take credit for. I should write it all up. Now that we have a common service API, we could now take away the many service entry points and write a single entry point that takes the name of a class, walks it through its lifecycle and runs it. This is precisely what YARN-679 proposes. Which in turn exactly what the Hoya entry point is. I'm using it there in both entry point and testing, so that I can evolve it based on my experience of using it in real apps.


The HADOOP-8545 openstack module isn't in there -I'd have liked it but at the same time didn't want to cause extra trouble by getting it in. FWIW the patch applies as is, it works properly -anyone can add it to their own build.

Minor tweaks whose implications are profound but nobody has noticed yet
  • HADOOP-9432 Add support for markdown .md files in site documentation. This gives you an enhanced text format for docs that has good editor support, and renders directly in github.
  • HADOOP-9833 move slf4j to version 1.7.5. This is initially for downstream apps that share the classpath, but it adds an option to Hadoop itself: the ability of Hadoop modules to switch from the commons-logging API to the SLF4J one: varargs with level-specific execution of an efficient unformatted printf output. This makes debug statements that much cleaner -and with print statements throughout the codebase, helps it overall.
  • More FS tests and the patches to S3 and S3n to fix some aspects of their behaviour which could lead to loss of data if you got your rename destination wrong.
  • All those patches I've done to trunk since 0.21. Because this release incorporates them: it is the big ASF first release of all the stuff that hasn't been backported. I'd recommend upgrading for my network diagnostics alone.
Even so, these are noise compared to the big pieces of work, of which the key ones are HDFS enhancements and the YARN execution engine, YARN being the most profound.

I was one of the people who +1'd this release, here is what I did to validate the build. Notice that my process involved grabbing the binaries by way of the ASF M2 staging repo: I need to validate the downstream build more than just the tarball.

# symlink /usr/local/bin/protoc to the homebrew installed 2.5.0 version

# delete all 2.1.0-beta artifacts in the mvn repo:

  find ~/.m2 -name 2.1.0-beta -print | xargs rm -rf

# checkout hbase source: from Apache: branch-0.95 (commit # b58d596 )

# switch to ASF repo (arun's private repo is in the POM, with JARs with the same sha1 sum, I'm just being rigorous)
<repository>
 <id>ASF Staging</id>
 <url>https://repository.apache.org/content/groups/staging/</url>
</repository>


# clean build of hbase tar against the beta artifacts

mvn clean install assembly:single -DskipTests -Dmaven.javadoc.skip=true -Dhadoop.profile=2.0 -Dhadoop-two.version=2.1.0-beta

# Observe DL taking place

[INFO] --- maven-assembly-plugin:2.4:single (default-cli) @ hbase ---
[INFO] Assemblies have been skipped per configuration of the skipAssembly parameter.
[INFO]                                                                        
[INFO] ------------------------------------------------------------------------
[INFO] Building HBase - Common 0.95.3-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repository.apache.org/content/groups/staging/org/apache/hadoop/hadoop-annotations/2.1.0-beta/hadoop-annotations-2.1.0-beta.pom
Downloaded: https://repository.apache.org/content/groups/staging/org/apache/hadoop/hadoop-annotations/2.1.0-beta/hadoop-annotations-2.1.0-beta.pom (2 KB at 3.3 KB/sec)
Downloading: https://repository.apache.org/content/groups/staging/org/apache/hadoop/hadoop-project/2.1.0-beta/hadoop-project-2.1.0-beta.pom

...

# get md5 sum of hadoop-common-2.1.0-beta artifact in https://repository.apache.org/content/groups/staging/:
0166f5c94d3699b3a37efc16ebb1ceea3acb3b53
# verify version of artifact in local m2 repo
    $ sha1sum ~/.m2/repository/org/apache/hadoop/hadoop-common/2.1.0-beta/hadoop-common-2.1.0-beta.jar
    0166f5c94d3699b3a37efc16ebb1ceea3acb3b53


# in hbase/hbase-assembly/target , gunzip then untar the hbase-0.95.3-SNAPSHOT-bin.tar file

# Patch the Hoya POM to use 2.1.0-beta instead of a local 2.1.1-SNAPSHOT

# run some of the hbase cluster deploy & flexing tests

mvn clean test  -Pstaging

 (all tests pass after 20 min)

Functional tests

# build and the Hoya JAR with classpath pulled in

mvn package -Pstaging


# D/L the binary .tar.gz file, and scp to an ubuntu VM with the hadoop conf properties for net-accessible HDFS & YARN services & no memory limits on containers

https://github.com/hortonworks/hoya/tree/master/src/test/configs/ubuntu
https://github.com/hortonworks/hoya/blob/master/src/test/configs/ubuntu/core-site.xml
https://github.com/hortonworks/hoya/blob/master/src/test/configs/ubuntu/yarn-site.xml


# stop the running hadoop-2.1.1-snapshot cluster

# start the new cluster services

hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
   
yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

   
# Verify in dfshealth.jsp that NN is up, version is built ( 2013-08-15T20:48Z by hortonmu from branch-2.1.0-beta ). (the NN came up in safe mode as a couple of in-flight blocks were missing; NN recovered from this happily after 20s)

# copy hbase-0.93 tarball to HDFS

hdfs dfs -copyFromLocal hbase-0.95.3-SNAPSHOT-bin.tar hdfs://ubuntu:9000/hbase.tar

# unfreeze an hbase cluster that was running on hbase-0.95.2 & Hadoop-2.1.1 (protobuf 2.4) versions & set up to use hdfs://ubuntu:9000/hbase.tar as the hbase image to install

java -jar target/hoya-0.3-SNAPSHOT.jar \
 org.apache.hadoop.hoya.Hoya thaw cl1  --manager ubuntu:8032 \
 --filesystem hdfs://ubuntu:9000

(notice how I have to spec the classname -this is the service launcher set up as the main class for the Hoya JAR; it can run any YARN service)

# verify cluster is running according to Hoya CLI and YARN web GUI

# point browser at HDFS master: http://ubuntu:8080/master-status , verify it is happy

# fault injection: ssh-in and kill -9 all HRegionServers

# verify that within 60s the # no. of region servers is as desired; YARN notified Hoya of container loss; replacement containers requested and region servers deployed.

# Freeze the cluster:
java -jar target/hoya-0.3-SNAPSHOT.jar org.apache.hadoop.hoya.Hoya\
 freeze cl1 \
 --manager ubuntu:8032 --filesystem hdfs://ubuntu:9000


# verify that hoya list command and YARN GUI show app as finished; HBase cluster is no longer present

This validation shows what YARN can bring to the table: you can not only run stuff in the same cluster, near the data, YARN works with the App Master to notify it of things failing, leaving it to the AM to handle it as it chooses. For Hoya, the action is: except when shutting down, ask for the number of containers needed to keep the cluster at its desired state. This is how we can run HBase clusters inside a YARN cluster. And, if you look at the source code, more is on the way...

Returning to Arun's annoucement: download and play with this, especially if you've been using Hadoop 2.0.5 or other pre-2.1 YARN platform. For anyone using 0.2.3.x in production, it's time to start regression testing during this beta phase, and coming up with an update strategy. For anyone on branch-1 based products, the upgrade is probably more significant -the HDFS improvements justify it irrespective of the features of YARN that matter most in large clusters and heterogeneous workloads. Again: download, start making sure that your code works in it -because these are the last few weeks to find critical bugs in that code before everything gets locked down until the successor release.


[photo: elephants in the Tarangire National Park, Tanzania. Probably the wildest night camping of the trip. Children can't run in the campsite in case they get mistaken for prey, and the driver couldn't get at the truck all night due to the 6 lions sleeping by it].