I recently attended the Hadoop Summit in San Jose. This is one of two major conferences organized around Hadoop, the other being Hadoop World. Nearly all the companies with Hadoop distributions were present along with several big users of Hadoop like Netflix, Twitter, and Linkedin.
Crossing The Chasm
If you’re not deeply involved with Hadoop, attending one of these conferences a year apart can be shocking. The advancements made in just the span of a year are amazing. The conference seemed notably larger this year, and I noticed more non-tech companies in the audience. I think it’s safe to say that Hadoop has crossed the chasm, at least for enterprise IT users.
Other than the type of attendees at the event, the other signal to me was the emergence of Hadoop 2.0. This second version of Hadoop focused on features that are important for users who want to run production-grade software for mission-critical systems. High-availability finally arrived for the name node (for the Open Source project, not the version Cloudera released for its distribution), a new version of Hive with more SQL-friendly features, and YARN which allows users to run just about anything on the Hadoop Distributed File System (HDFS). These types of stability and availability features tend to show up when there is a critical mass of users who want to use software for production.
Quite A YARN