Comparing Cloud Infrastructure Options for Running NoSQL Workloads

April 11th, 2014 by - 6,307 views

A walk through in-memory, general compute, and mass storage options for Cassandra, MongoDB, Riak, and HBase workloads

I recently had the pleasure of attending Cassandra Tech Day in San Jose, a developer-focused event where people were learning about various options for deploying Cassandra clusters. As it turns out, there was a lot of buzz surrounding the new in-memory option for Cassandra and the use cases for it. This interest got me thinking about how to map the options customers have for running Big Data across clouds.

For a specific workload, NoSQL customers may want to have the following:

1. Access to mass storage servers for files and objects (not to be confused with block storage). Instead, we’re talking on-demand access to terabytes of raw spinning disk volumes for running a large storage array (think storage hub for Hadoop/HBase, Cassandra, or MongoDB).

2. Access to High RAM options for running in-memory with the fastest possible response times—the same times you’d need when running the in-memory version of Cassandra or even running Riak or Redis in-memory.

3. Access to high-performance SSDs to run balanced workloads. Think about what happens after you run a batch operation. If you’re relating information back to a product schema, you may want to push that data into something like PostgrSQL, SQL, or even MySQL and have access to block storage.

4. Access to general-purpose instances for dev and test or for workloads that don’t have specific performance SLAs. This ability is particularly important when you’re trialing and evaluating a variety of applications. GoGrid’s customer’s, for example, leverage our 1-Button Deploy™ technology to quickly spin up dev clusters of common NoSQL solutions from MongoDB to Cassandra, Riak, and HBase.

The Compute Instance Footprint graphic below illustrates the options currently available. It relates the maximum amount of RAM that can be provisioned on a single virtual machine to the maximum number of cores and the total number of compute instance options available by some of the major infrastructure providers besides GoGrid, including Amazon Web Services (AWS), Google (GCE), Microsoft (Azure), and SoftLayer.


The data that generated this chart was collected from the providers’ websites. Below are the details.


This data helps answer 3 key questions for customers:

1. Which cloud providers offer an on-demand high memory instance?

GoGrid and AWS offer the largest amount of RAM available on-demand. You’ll note that I used the actual number of vCores verses ECU units, so you may notice the difference if you’re looking at AWS’s website. Because ECU units are proprietary to AWS, I don’t like to use them as a standard measure.

2. How many cloud providers offer on-demand mass storage server options?

Of the five providers included, only two offer on-demand mass storage options. GoGrid offers six Raw Disk cloud server options with up to 45 x 4 TB dedicated spinning disk volumes, so you’re looking at 180 TB total storage. AWS offers one instance with up to 48 TB storage. GCE and Azure expect you to provision block, blob, or other object storage to solve your storage needs; however, this approach isn’t ideal if you want dedicated disk volumes, for example.

3. Which providers can I go to for general workloads?

You can run a general workload on any of the providers, and this is where the Compute Instance Footprint becomes important. I was surprised to see that GCE and Azure have such limited footprints. On the other hand, I expected this limitation from SoftLayer because it’s mostly a dedicated server shop vs. a cloud shop.

Once you’ve determined the instance options that best meet the needs of your particular NoSQL workload, the next step is to do a “bake-off” between two or more solutions. At GoGrid we’ve made this process as easy as pushing a button. Check out our solutions page for details on 1-Button Deployments of Cassandra, Hadoop, HBase, and Riak. And of course you trial a full cluster for 14 days for free and have it up and running in less than 10 minutes.

Beyond trialing a cluster and understanding the instance options for running NoSQL workloads, price and ease of use always come into play when choosing a cloud provider. In my next blog post, I’ll share a pricing analysis so you can see the costs associated with managing specific workloads across clouds and show you how our 1-Button Deploy™ technology saves you time and money.

The following two tabs change content below.

Kole Hicks

Senior Director of Product Management at GoGrid
Kole Hicks is the Senior Director of Product Management for GoGrid, the leader in Open Data Services (ODS) and committed to delivering purpose-built, non-opinionated Big Data solutions and services for the management and integration of open source, commercial, and proprietary technologies across multiple platforms..

Leave a reply