KML_FLASHEMBED_PROCESS_SCRIPT_CALLS

Archive for the ‘Open Data Services’ Category

 

Infographic: Big Data or Big Confusion? The Key is Open Data Services

Tuesday, July 22nd, 2014 by

When folks refer to “Big Data” these days, what is everyone really talking about? For several years now, Big Data has been THE buzzword used in conjunction with just about every technology issue imaginable. The reality, however, is that Big Data isn’t an abstract concept. Whether you like it or not, you’re already inundated with Big Data. How you source it, what insights you derive from it, and how quickly you act on it will play a major role in determining the course—and success—of your company. To help you get started understanding the key Big Data trends, take a look at this infographic: “60-Second Guide to Big Data and the Cloud.”

GoGrid_BigData_revised_300

Handling the increased volume, variety, and velocity—the “3V/s”—of data (shown in the center of the infographic) requires a fundamental shift in the makeup of the platform required to capture, store, and analyze the data. A platform that’s capable of handling and capitalizing on Big Data successfully requires a mix of structured data-handling relational databases, unstructured data-handling NoSQL databases, caching solutions, and map reducing Hadoop-style tools.

As the need for new technologies to handle the “3V/s” of Big Data has grown, open source solutions have become the catalysts for innovation, generating a steady launch of new, relevant products to tackle Big Data challenges. Thanks to the skyrocketing pace of innovation in specialized databases and applications, businesses can now choose from a variety of proprietary and open source solutions, depending on the database type and their specific database requirements.

Given the wide variety of new and complex solutions, however, it’s no surprise that a recent survey of IT professionals showed that more than 55% of Big Data projects fail to achieve their goals. The most significant challenge cited was a lack of understanding of and the ability to pilot the range of technologies on the market. This challenge systematically pushes companies toward a limited set of proprietary platforms that often reduce the choice down to a single technology. Perpetuating the tendency to seek one cure-all technology solution is no longer a realistic strategy. No single technology such as a database can solve every problem, especially when it comes to Big Data. Even if such a unique solution could serve multiple needs, successful companies are always trialing new solutions in the quest to perpetually innovate and thereby achieve (or maintain) a competitive edge.

Open Data Services and Big Data go hand-in-hand

(more…) «Infographic: Big Data or Big Confusion? The Key is Open Data Services»

Is MapReduce Dead?

Tuesday, July 15th, 2014 by

With the recent announcement by Google of Cloud DataFlow (intended as the successor to MapReduce) and with Cloudera now focusing on Spark for many of its projects, it looks like the days of MapReduce may be numbered. Although the change may seem sudden, it’s been a long time coming. Google wrote the MapReduce white paper 10 years ago, and developers have been using at least one distribution of Hadoop for about 8 years. Users have had ample time to determine the strengths and weaknesses of MapReduce. However, the release of Hadoop 2.0 and YARN clearly indicated that users wanted to live in a more diverse Big Data world.

spark-logo

Earlier versions of Hadoop could be described as MapReduce + HDFS (Hadoop Distributed File System) because that was the paradigm that everything Hadoop revolved around. Because users clamored for interactive access to Hadoop data, the Hive and Pig projects were started. And even though you could write SQL queries with Hive and script in Pig Latin with Pig, under the covers Hadoop was still running MapReduce jobs. That all changed in Hadoop 2.0 with the introduction of YARN. YARN became the resource manager for a Hadoop cluster that broke the dependence between MapReduce and HDFS. Although HDFS still remained as the file system, MapReduce became just another application that can interface with Hadoop through YARN. This change made it possible for other applications to now run on Hadoop through YARN.

Google is not known as a backer in the mold of Hortonworks or Cloudera with the open source Hadoop ecosystem. After all, Google was running its own versions of MapReduce and HDFS (the Google File System) on which these open-source projects are based. Because they are integral parts of Google’s internal applications, Google has the most experience with using these technologies. And although Cloud DataFlow is specifically for use on the Google cloud and appears more like a competitor to Amazon’s Kinesis product, Google is very influential in Big Data circles, so I can see other developers following Google’s lead and leveraging a similar technology in favor of MapReduce.

Although Google’s Cloud DataFlow may have a thought leadership-type impact, Cloudera’s decision to leverage Spark as the standard processing engine for its projects (in particular, Hive) will have a greater impact on open-source Big Data developers. Cloudera has one of the most popular Hadoop distributions on the market and has partnered with Databricks, Intel, MapR, and IBM to work on their Spark integration with Hive. This trend is surprising given Cloudera’s investment in Impala (its SQL query engine), but the company clearly feels that Spark is the future. As little as a year ago, Spark was mostly seen as fast in-memory computing for machine learning algorithms. However with its promotion to an Apache Top-Level Project in February 2014 and its backing company Databricks receiving $33 million in Series B funding, Spark clearly has greater ambitions. The advent of YARN made it much easier to tie Spark to the growing Hadoop ecosystem. Cloudera’s decision to leverage Spark in Hive and other projects makes it even more important to users of the CDH distribution.

spark-stack

(more…) «Is MapReduce Dead?»

High RAM Cloud Servers for Distributed Caching

Tuesday, June 10th, 2014 by

GoGrid has just released High RAM Cloud Servers on our high-performance fabric. These servers are designed to provide a high amount of available RAM that is most commonly required for caching servers. Like our other recent product releases, these servers are all built on our redundant 10-Gbps public and private network.

High RAM Cloud Servers are available in the following configurations:

High RAM RAM Cores SSD Storage
X-Large 16 GB 4 40 GB
2X-Large 32 GB 8 40 GB
4X-Large 64 GB 16 40 GB
8X-Large 128 GB 28 40 GB
16X-Large 256 GB 40 40 GB

 

 

 

(more…) «High RAM Cloud Servers for Distributed Caching»

HBase Made Simple

Wednesday, April 30th, 2014 by

GoGrid has just released its 1-Button Deploy™ of HBase, available to all customers in the US-West-1 data center. This technology makes it easy to deploy either a development or production HBase cluster on GoGrid’s high-performance infrastructure. GoGrid’s 1-Button Deploy™ technology combines the capabilities of one of the leading NoSQL databases with our expertise in building high-performance Cloud Servers.

HBase is a scalable, high-performance, open-source database. HBase is often called the Hadoop distributed database – it leverages the Hadoop framework but adds several capabilities such as real-time queries and the ability to organize data into a table-like structure. GoGrid’s 1-Button Deploy™ of HBase takes advantage of our SSD and Raw Disk Cloud Servers while making it easy to deploy a fully configured cluster. GoGrid deploys the latest Hortonworks’ distribution of HBase on Hadoop 2.0. If you’ve ever tried to deploy HBase or Hadoop yourself, you know it can be challenging. GoGrid’s 1-button Deploy™ does all the heavy lifting and applies all the recommended configurations to ensure a smooth path to deployment.

Why GoGrid Cloud Servers?

SSD Cloud Servers have several high-performance characteristics. They all come with attached SSD storage and large available RAM for the high I/O uses common to HBase. The Name Nodes benefit from the large RAM options available on SSD Cloud Servers and the Data Nodes use our Raw Disk Cloud Servers, which are configured as JBOD (Just a Bunch of Disks). This is the recommended disk configuration for Data Nodes, and GoGrid is one of the first providers to offer this configuration in a Cloud Server. Both SSD and Raw Disk Cloud Servers use a redundant 10-Gbps public and private network to ensure you have the maximum bandwidth to transfer your data. Plus, the cloud makes it easy to add more Data Nodes to your cluster as needed. You can use GoGrid’s 1-Button Deploy™ to provision either a 5-server development cluster or an 11-server production cluster with Firewall Service enabled.

Development Environments

The smallest recommended size for a development cluster is 5 servers. Although it’s possible to run HBase on a single server, you won’t be able to test failover or how data is replicated across nodes. You’ll most likely have a small database so you won’t need as much RAM, but will still benefit from SSD storage and a fast network. The Data Nodes use Raw Disk Cloud Servers and are configured with a replication factor of 3.

(more…) «HBase Made Simple»

Infographic: 2014 – The Year of Open Source?

Tuesday, April 8th, 2014 by

If you’re a software developer, you’ve probably already used open-source code in some of your projects. Until recently, however, people who aren’t software developers probably thought “open source” referred to a new type of bottled water. But all that’s beginning to change. Now you can find open-source versions of everything from Shakespeare to geospatial tools. In fact, the first laptop built almost entirely on open source hardware just hit the market. In the article announcing the new device, Wired noted that, “Open source hardware is beginning to find its own place in the world, not only among hobbyists but inside big companies such as Facebook.”

GoGrid_OpenSource200_blog

Why now?

Open source technology has moved from experiment to mainstream partly because the concept itself has matured. Companies that used to zealously guard their proprietary software or hardware may now be building some or all of it on open-source code and even giving back to the relevant communities. Plus repositories like GitHub, Bitbucket, and SourceForge make access to open-source code easy.

In its annual “Future of Open Source Survey,” North Bridge Venture Partners summarized 3 reasons support for open source is broadening:

1. Quality: Thanks to strong community support, the quality of open-source offerings has improved dramatically. They now compete with proprietary or commercial equivalents on features–and can usually be deployed more quickly. Goodbye vendor “lock-in.”

(more…) «Infographic: 2014 – The Year of Open Source?»