2012-12-05

An Intro to Contributing to Hadoop

Together the ants shall conquer the elephant

Jeff Bean of Clouder has stuck up a video on contributing to Hadoop, which is a reasonable introduction to JIRA-centric development.

Process-wise, there's a few things I'd add:
  • Search for the issue or feature before you file a new bug.The first line of a stack trace is a great search term, though it's a bit depressing to find the only other person to find it was yourself 18 months earlier, and you never fixed in then either.
  • It's harder to get committer rights on Hadoop than most other projects, because the barrier to effort and competence is high. You pretty much have to work full time on the project. Posting four JIRAs and then asking to get committer access is unrealistic. And it doesn't bring much to the table except bragging rights. 
  • The bit at 16:20 where Jeff said "email other contributors to get eyes" was in fact an error. He meant to say "email wittenauer to get constructive feedback on your ideas" -nobody else welcomes such emails, and actually talking on the -dev list is better.
  • I'd also emphasise the "watch issue" button. If there is something you care about, hit the watch button to get emails whenever it is updated.
  • When you file a bug, include stack traces, kill -QUIT thread dumps, nestat and lsof details for the process in question; anything else. NOT: JPG screen shots of your Dos console. That flags up that you are probably out your depth when it comes to getting JAVA_HOME set, let alone discussing the impact of VM clock drift on consensus protocol-based distributed journalling systems.
  • When you file your bug, your rating: critical, major, etc, differs from everyone else. Mine are normally minor or trivial. If they only affect you: minor. Easy to fix: trivial. 
  • Don't file bugs about "I couldn't get Hadoop to install". Those bugs will be closed as invalid; posts on it to the -dev lists silently ignored. Go to the user lists. 

I was a bit disappointed by the claim that "the apache artifacts aren't stable, you need CDH" and the message that there is "the community" and "cloudera engineers", the latter being the only people who make Hadoop enterprise-ready. As well as Hortonworks, there are companies like IBM, Microsoft and VMWare working on making sure their customers' needs are met -and testing the Apache releases to make sure they're up to a state where you can use them in production.(*)

This "we are the engineers" story falls over at 07:00 when the walk through of the (epic) HA NN work, my colleagues Sanjay, Suresh and Jitendra all get a mention. Because Hadoop is a community project -one that involves multiple companies working together on Hadoop -as well as individuals and small teams. The strength of the Hadoop codebase comes from the combined contributions from everyone. Furthermore, having a no-single-vendor open source project, with public artifacts you can pick up and use, adds a strategic advantage to that codebase. Hadoop is not MySQL or OpenJDK -open source with secret bits that the single vendor can charge for. There's a cost to that -more need to develop a consensus, which is why I encourage people using Hadoop in production systems to get on the -dev lists, regardless of how Hadoop gets to your servers. Participation in those discussions gives you a direct say in the future direction of the project.

Overall though, not a bad intro to how to get started in the development. It makes me think I should do a video of my intro to hadoop-dev slides, which looks less at JIRA and more about why the development process is as it is, and how we could improve it. Someone else can do the "why Maven is considered a good tool for releasing Hadoop" talk -all I know is that I have to to a "mvn install -DskipTests" every morning to stop maven trying to go to the apache snapshot repo to download other people's artifacts, instead of the ones I build the day before.

(*) Yes, I know that Hadoop 1.1.1 is being replaced with a 1.1.2 to backport a deadlock show-stopper, but that's a very rare case -and shows that we do react to any problem in the stable branch that is considered serious.

[Photo, "together the ants shall conquer the elephant", alongside the M32 in Easton].

No comments:

Post a Comment

Comments are usually moderated -sorry.