Posts Tagged ‘scalable’


Big Data Cloud Servers for Hadoop

Monday, January 13th, 2014 by

GoGrid just launched Raw Disk Cloud Servers, the perfect choice for your Hadoop data node. These purpose-built Cloud Servers run on a redundant 10-Gbps network fabric on the latest Intel Ivy Bridge processors. What sets these servers apart, however, is the massive amount of raw storage in JBOD (Just  a Bunch of Disks) configuration. You can deploy up to 45 x 4 TB SAS disks on 1 Cloud Server.

These servers are designed to serve as Hadoop data nodes, which are typically deployed in a JBOD configuration. This setup maximizes available storage space on the server and also aids in performance. There are roughly 2 cores allocated per spindle, giving these servers additional MapReduce processing power. In addition, these disks aren’t a virtual allocation from a larger device. Each volume is actually a dedicated, physical 4 TB hard drive, so you get the full drive per volume with no initial write penalty.

Hadoop in the cloud

Most Hadoop distributions call for a name node supporting several data nodes. GoGrid offers a variety of SSD Cloud Servers that would be perfect for the Hadoop name node. Because they are also on the same 10-Gbps high-performance fabric as the Raw Disk Cloud Servers, SSD servers provide low latency private connectivity to your data nodes. I recommend using at least the X-Large SSD Cloud Server (16 GB RAM), although you may need a larger server, depending on the size of your Hadoop cluster. Because Hadoop stores metadata in memory, you’ll want more RAM if you have a lot of files to process. You can use any size Raw Disk Cloud Server, but you’ll want to deploy at least 3. Also, each Raw Disk Cloud Server has a different allocation of raw disks, which are illustrated in the table below. The Cloud Server in the illustration is the smallest size that has multiple disks per Cloud Server. Hadoop defaults to a replication factor of three, so to protect your data from failure, you’ll want to have at least 3 data nodes to distribute data. Although Hadoop attempts to replica data to different racks, there’s no guarantee that your Cloud Servers will be on different racks.

Note that the example below is for illustrative purposes only and is not representative of a typical Hadoop cluster; for example, most Cloudera and Hortonworks sizing guides start at 8 nodes. These configurations can differ greatly depending on if you intend to use the cluster for development, production, or production with HBase added. This includes the RAM and disk sizes (less of both for development, most likely more for HBase). Plus, if you’re thinking of using these nodes for production, you should consider adding a second name node.

Hadoop-cluster (more…) «Big Data Cloud Servers for Hadoop»

How To Optimize Cloud Server Workloads to Maximize Efficiency

Monday, September 24th, 2012 by

If you’re familiar with cloud infrastructure and infrastructure-as-a-service (IaaS), you probably understand the substantial benefits that come along with deploying infrastructure in the public cloud: things like “utility billing and on-demand availability,” “elastic benefits that let you scale resources up and down based on demand,” and “the ability to rapidly move and redeploy workloads as needed.” This flexibility is why we originally brought GoGrid’s hourly pay-as-you-go Cloud Servers to market. They’re perfect for specific cases like these:

  • Periodic workloads that only run for a few hours, days, or weeks during a given billing cycle
  • Short-term, project-based workloads where term commitments aren’t desirable
  • Short-term spikes in workload where demand is erratic and being able to scale resources up and down quickly are desirable
  • Development and test workloads that require rapid iteration and redeployment of resources
  • Proof of concept workloads where instant access to resources and the ability to quickly change technology are key

Customers with steady-state and long-term workloads don’t always need this hourly flexibility, however. And that’s why GoGrid has developed prepaid monthly, semiannual, and annual Cloud Server products. Prepaid Cloud Servers are less flexible, but they do offer significant cost savings in exchange for the term commitment. The shortest prepaid term GoGrid offers is a monthly prepaid Cloud Server and the longest term is an annual prepaid Cloud Server.

If you run a constant workload during a given month, a prepaid term server is probably a better solution than an hourly server. Again, the tradeoff here is flexibility. Prepaid servers are ideal for:

  • Steady-state workloads where demand is constant
  • Workloads that tend to grow rather than contract
  • Production applications where you can plan for demand in advance

For example, imagine you run an eCommerce website. You know you always need three servers to run your operations throughout the year. During the holiday season, however, you know demand is likely to spike. Your deployment of annual servers going into the holiday would look something like this:

(more…) «How To Optimize Cloud Server Workloads to Maximize Efficiency»