For that reason, I'm going to talk in detail about why my talks will be so excellent that to even think about having them left out could be detrimental to the entire conference.
One of my talks is "Taking Hadoop to the Clouds".
There are two competitors
- Deploying Hadoop in the Cloud, which looks at options, details and best practices. I don't see anything particularly compelling in the abstract -I assume it's got more votes as it's the one that comes up first. Or they are trying the many-email-address-vote-stuffing technique(*).
- How to Deploy Hadoop Applications on Any Cloud & Optimize Price Performance. This could be interesting, as it covers how CliQr deploys Hadoop on different infrastructures. It sounds like a rackable-style orchestraction layer above infrastructures, for Hadoop it may have similarities with MastodonC's Kixi work,
I'm giving the talk.
This is not me being egocentrically smug about the quality of my presentations, but because I'm reasonably confident I know a lot about the area.
- My last time at HP Labs was spent on the implementation of the "Cells" virtual infrastructure: declarative configuration of the entire cluster design. The details were presented at the 5th IEEE/ACM conference on Utility and Cloud Computing, and will no doubt be in the ACM library. This means I know about IaaS implementation details; the problems of placement, why networking behaves the way it does, image management, what UIs could look like, what the APIs could be, etc.
- I've spent a lot of time publicly making Hadoop cloud-friendly. I presume that MS Azure and AWS ElasticMR have put in more hours, but unless they're going to talk about their work, Tom White and myself are the next choices. Jun Ping and VMWare colleagues have done a lot too -and big patches into the codebase, but I don't see any submissions from them.
- I have opinions on the matter. They aren't clear cut "cloud good/physical bad" or "physical bad/cloud good". There are arguments either way; it depends on what you want to do, what your data volume is, and where it lives.
- I'm still working in the area, in Hadoop itself and the code nearby.
- HADOOP-8545: a Swift Filesystem driver for OpenStack. This is something everyone running Hadoop on Rackspace or other OpenStack clusters will want. This week two different implementations have surfaced, getting them merged together is going to be the next activity,
- WHIRR-667: Add whirr support for HDP-1 installation
- Ambari with Whirr. Proof of concept more than anything else.
- Jclouds and Rackspace UK throttling. Adrian Cole managed to reduce the impact of issue-549, which is good as I don't really want to get sucked into a different OSS codebase,
- Other things that I'm not going to talk about -yet.
(*) ps, for anyone planning the many-email-accounts approach, remember that the email addresses are something we reviewers can look at, and many sequential accounts all doing three votes to a single talk will show up as "statistically significant". Russ has the data, he likes his analyses. He may even have the IP addresses.
[Photo: an interview with Page 6 Guy at ApacheCon]