KML_FLASHEMBED_PROCESS_SCRIPT_CALLS

Posts Tagged ‘Raw Disk’

 

HBase Made Simple

Wednesday, April 30th, 2014 by

GoGrid has just released its 1-Button Deploy™ of HBase, available to all customers in the US-West-1 data center. This technology makes it easy to deploy either a development or production HBase cluster on GoGrid’s high-performance infrastructure. GoGrid’s 1-Button Deploy™ technology combines the capabilities of one of the leading NoSQL databases with our expertise in building high-performance Cloud Servers.

HBase is a scalable, high-performance, open-source database. HBase is often called the Hadoop distributed database – it leverages the Hadoop framework but adds several capabilities such as real-time queries and the ability to organize data into a table-like structure. GoGrid’s 1-Button Deploy™ of HBase takes advantage of our SSD and Raw Disk Cloud Servers while making it easy to deploy a fully configured cluster. GoGrid deploys the latest Hortonworks’ distribution of HBase on Hadoop 2.0. If you’ve ever tried to deploy HBase or Hadoop yourself, you know it can be challenging. GoGrid’s 1-button Deploy™ does all the heavy lifting and applies all the recommended configurations to ensure a smooth path to deployment.

Why GoGrid Cloud Servers?

SSD Cloud Servers have several high-performance characteristics. They all come with attached SSD storage and large available RAM for the high I/O uses common to HBase. The Name Nodes benefit from the large RAM options available on SSD Cloud Servers and the Data Nodes use our Raw Disk Cloud Servers, which are configured as JBOD (Just a Bunch of Disks). This is the recommended disk configuration for Data Nodes, and GoGrid is one of the first providers to offer this configuration in a Cloud Server. Both SSD and Raw Disk Cloud Servers use a redundant 10-Gbps public and private network to ensure you have the maximum bandwidth to transfer your data. Plus, the cloud makes it easy to add more Data Nodes to your cluster as needed. You can use GoGrid’s 1-Button Deploy™ to provision either a 5-server development cluster or an 11-server production cluster with Firewall Service enabled.

Development Environments

The smallest recommended size for a development cluster is 5 servers. Although it’s possible to run HBase on a single server, you won’t be able to test failover or how data is replicated across nodes. You’ll most likely have a small database so you won’t need as much RAM, but will still benefit from SSD storage and a fast network. The Data Nodes use Raw Disk Cloud Servers and are configured with a replication factor of 3.

(more…) «HBase Made Simple»

Big Data Cloud Servers for Hadoop

Monday, January 13th, 2014 by

GoGrid just launched Raw Disk Cloud Servers, the perfect choice for your Hadoop data node. These purpose-built Cloud Servers run on a redundant 10-Gbps network fabric on the latest Intel Ivy Bridge processors. What sets these servers apart, however, is the massive amount of raw storage in JBOD (Just  a Bunch of Disks) configuration. You can deploy up to 45 x 4 TB SAS disks on 1 Cloud Server.

These servers are designed to serve as Hadoop data nodes, which are typically deployed in a JBOD configuration. This setup maximizes available storage space on the server and also aids in performance. There are roughly 2 cores allocated per spindle, giving these servers additional MapReduce processing power. In addition, these disks aren’t a virtual allocation from a larger device. Each volume is actually a dedicated, physical 4 TB hard drive, so you get the full drive per volume with no initial write penalty.

Hadoop in the cloud

Most Hadoop distributions call for a name node supporting several data nodes. GoGrid offers a variety of SSD Cloud Servers that would be perfect for the Hadoop name node. Because they are also on the same 10-Gbps high-performance fabric as the Raw Disk Cloud Servers, SSD servers provide low latency private connectivity to your data nodes. I recommend using at least the X-Large SSD Cloud Server (16 GB RAM), although you may need a larger server, depending on the size of your Hadoop cluster. Because Hadoop stores metadata in memory, you’ll want more RAM if you have a lot of files to process. You can use any size Raw Disk Cloud Server, but you’ll want to deploy at least 3. Also, each Raw Disk Cloud Server has a different allocation of raw disks, which are illustrated in the table below. The Cloud Server in the illustration is the smallest size that has multiple disks per Cloud Server. Hadoop defaults to a replication factor of three, so to protect your data from failure, you’ll want to have at least 3 data nodes to distribute data. Although Hadoop attempts to replica data to different racks, there’s no guarantee that your Cloud Servers will be on different racks.

Note that the example below is for illustrative purposes only and is not representative of a typical Hadoop cluster; for example, most Cloudera and Hortonworks sizing guides start at 8 nodes. These configurations can differ greatly depending on if you intend to use the cluster for development, production, or production with HBase added. This includes the RAM and disk sizes (less of both for development, most likely more for HBase). Plus, if you’re thinking of using these nodes for production, you should consider adding a second name node.

Hadoop-cluster (more…) «Big Data Cloud Servers for Hadoop»