KML_FLASHEMBED_PROCESS_SCRIPT_CALLS

Archive for the ‘DevOps’ Category

 

Is MapReduce Dead?

Tuesday, July 15th, 2014 by

With the recent announcement by Google of Cloud DataFlow (intended as the successor to MapReduce) and with Cloudera now focusing on Spark for many of its projects, it looks like the days of MapReduce may be numbered. Although the change may seem sudden, it’s been a long time coming. Google wrote the MapReduce white paper 10 years ago, and developers have been using at least one distribution of Hadoop for about 8 years. Users have had ample time to determine the strengths and weaknesses of MapReduce. However, the release of Hadoop 2.0 and YARN clearly indicated that users wanted to live in a more diverse Big Data world.

spark-logo

Earlier versions of Hadoop could be described as MapReduce + HDFS (Hadoop Distributed File System) because that was the paradigm that everything Hadoop revolved around. Because users clamored for interactive access to Hadoop data, the Hive and Pig projects were started. And even though you could write SQL queries with Hive and script in Pig Latin with Pig, under the covers Hadoop was still running MapReduce jobs. That all changed in Hadoop 2.0 with the introduction of YARN. YARN became the resource manager for a Hadoop cluster that broke the dependence between MapReduce and HDFS. Although HDFS still remained as the file system, MapReduce became just another application that can interface with Hadoop through YARN. This change made it possible for other applications to now run on Hadoop through YARN.

Google is not known as a backer in the mold of Hortonworks or Cloudera with the open source Hadoop ecosystem. After all, Google was running its own versions of MapReduce and HDFS (the Google File System) on which these open-source projects are based. Because they are integral parts of Google’s internal applications, Google has the most experience with using these technologies. And although Cloud DataFlow is specifically for use on the Google cloud and appears more like a competitor to Amazon’s Kinesis product, Google is very influential in Big Data circles, so I can see other developers following Google’s lead and leveraging a similar technology in favor of MapReduce.

Although Google’s Cloud DataFlow may have a thought leadership-type impact, Cloudera’s decision to leverage Spark as the standard processing engine for its projects (in particular, Hive) will have a greater impact on open-source Big Data developers. Cloudera has one of the most popular Hadoop distributions on the market and has partnered with Databricks, Intel, MapR, and IBM to work on their Spark integration with Hive. This trend is surprising given Cloudera’s investment in Impala (its SQL query engine), but the company clearly feels that Spark is the future. As little as a year ago, Spark was mostly seen as fast in-memory computing for machine learning algorithms. However with its promotion to an Apache Top-Level Project in February 2014 and its backing company Databricks receiving $33 million in Series B funding, Spark clearly has greater ambitions. The advent of YARN made it much easier to tie Spark to the growing Hadoop ecosystem. Cloudera’s decision to leverage Spark in Hive and other projects makes it even more important to users of the CDH distribution.

spark-stack

(more…) «Is MapReduce Dead?»

Cloud infrastructure supports agile IT endeavors

Monday, August 5th, 2013 by

Companies often seek to use cloud computing technologies in an effort to improve business agility at a lower cost than other technical endeavors. Although hosted environments have an inherent flexibility that lets organizations carry out tasks more efficiently, decision-makers can’t simply deploy one cloud service and expect to reap all the rewards. Instead, enterprises need to ensure the cloud architectures they use have the necessary qualities to support a more agile workplace.

In today’s fast-paced business world, application agility is one of the best characteristics for an organization to have because it ensures employees can access and use mission-critical solutions from virtually anywhere. A recent CIO report report highlighted how leveraging an efficient cloud infrastructure service can dramatically improve efficiency as a result of its easy scalability and automated provisioning. When these characteristics are combined with other critical elements, companies can be sure they have the agile qualities they need to thrive.

Cloud infrastructure supports agile IT endeavors

Cloud infrastructure supports agile IT endeavors

Embrace agile development
In the past, there was one tried-and-true method for application development used by most of the business world. Today, the diversity of the corporate landscape has encouraged decision-makers to pursue strategies that differ from competitors to create room for possible advantages. This demand, coupled with the proliferation of cloud computing and mobile projects, has led to the emergence of the agile development movement.

CIO noted that this mentality is considered the norm in today’s enterprise, although many firms have yet to deploy these strategies effectively. By incorporating an agile development concept into the cloud infrastructure, employees can gain access to the automated tools they need to circumvent old processes that often resulted in unwanted, unused, or inefficient applications.

A separate First Line Software report echoed the importance of including the cloud in an agile development strategy because the hosted technology supports greater levels of service delivery and encourages users to take advantage of its scalable capacity. When enterprises leverage cloud and agile initiatives simultaneously, they can streamlines the creation and deployment process to ensure employees can take full advantage of the tools in a timely manner.

(more…) «Cloud infrastructure supports agile IT endeavors»

Agile Development at GoGrid with Pallet and Jclouds (Presentation)

Tuesday, February 8th, 2011 by

In order to provide a more “rounded” voice on the GoGrid blog, we are going to start having some new authors. To kick off this initiative, I wanted to introduce Paul Lappas, GoGrid’s VP of Engineering and Co-Founder. Paul manages GoGrid’s engineering efforts, technical operations, IT and the technology vision for GoGrid.

Recently, he and some other team members attending a MeetUp in San Francisco at the Twitter headquarters to discuss and present  JClouds and Pallet and how those tools are being used at GoGrid. Here is Paul’s synopsis of Jclouds and the presentation:

“GoGrid is doing some really cool stuff using an automated provisioning technology called “Pallet”.  Pallet is similar to existing automating configuration technologies like Puppet and Chef but with a key difference that it was built specifically to solve the problem of quickly spinning up and configuring groups of servers in the cloud. It support “Jclouds” out of the box and is implemented as a set of libraries for “Clojure” which is a LISP-based programming language that is quickly building steam.  Jclouds is an open source framework developed by Adrian Cole that helps you get started in the cloud and reuse your java development skills, with an API that allows you the freedom to use portable abstractions or cloud-specific features.

GoGrid is working with the author of Pallet (Hugo Duncan) and a key contributor Toni Batchelli to enable the fast deployment of fully functional GoGrid environments for use by development teams for test & dev. It’s a tough problem for most companies, but especially challenging for us considering how complex (and capital intensive) it is to stage an end-to-end GoGrid environment due to the sheer breadth of technologies that span almost all 7 layers of the OSI stack. With Pallet, we are able to treat our “infrastructure as code” and manage the configuration of systems, networks and applications just like we do our source code so that they can be quickly applied to spin-up new environments. But perhaps the coolest aspect is that we are using GoGrid internally to virtualize the individual components! It’s kind of like Inception where there is a grid-within-a-grid. Our teams are still getting their heads around logging into a GoGrid account and seeing virtual representation of physical GoGrid components represented as VM icons in the GUI! Very cool stuff.

The following presentation provides more details of the implementation and was presented at the recent Jclouds meetup at Twitter’s headquarter in San Francisco. Check out future jclouds meetups here:
http://www.meetup.com/jclouds/

Below is the presentation that was presented at the Jclouds meetup: (more…) «Agile Development at GoGrid with Pallet and Jclouds (Presentation)»