I've left HP. I did that on Monday, enjoying a final beer at lunchtime with my soon-to-be-ex colleagues, then heading home for a few weeks of parental responsibilities during the easter break.
Later this month, I will start work at Hortonworks, pushing the Hadoop stack forwards. I am really excited about this -I know a lot of people in the company already, and it's going to be great working with them!
Although the phrase "Big Data" is getting overused, it's obvious to me that there is a real coming together of different trends to make the whole Hadoop-based ecosystem as transformational as web servers were.
- There are so many devices in the modern world acting as data sources -physical devices such as mobile phones and jet engines, services such as web applications, people making use of devices and services.
- In the past less data was generated -and it was normally thrown away. Too expensive to store, no perceived value.
- The cost per TB of HDD has fallen such that you can now afford to keep that data for later analysis
- You can't analyse it on single servers as the bandwidth of HDDs hasn't increased at the same rate as the storage capacity.
- The performance of a single CPU has effectively topped out too. All that is coming is more cores, more operations/joule (hopefully), different forms of parallel computation. The free speedups that the CPU vendors used to dish out are over. It's either single-machine parallelism or multi-machine. Oh, and either way: heterogeneity of some form or other.
- That means everyone is going to have to embrace parallel computing, on the single machine or in the rack -and with the right algorithms, that rack can be made to deliver linear and sometimes superlinear speedup.
- If you want to work with the big datasets that you can collect today, you are going to need a rack of servers and a framework to let you process the data.
- The Hadoop platform provides the framework to store the data across those hard disks, and to distribute the work across them. It is becoming the single open-source alternative to Google's internal platform.
That's why I'm joining Hortonworks -to go full time on building the future platform for server-side computing.
[photo: preparing to descend into Crickhowell, Wales, 2011]
Nice one Sir. I'm also learning to understand Hadoop from Tom WHite's Hadoop book. I graduated a year ago as Software Engineer, worked as Oracle DBA for over a year and have now decided to go into the Hadoop domain.
ReplyDeleteHarmeet
twitter.com/oraa1
Are you aware of work around swapping out HDFS for something else, like Swift...
ReplyDeleteMichael: I'm actually the hortonworks part of the team doing the Hadoop Swift Integration, https://issues.apache.org/jira/browse/HADOOP-8545.
ReplyDeleteSwift is a blobstore, not a filesystem, and has different semantics from what Hadoop apps expect, MapReduce and HBase for example. Also its slower.