Posts Tagged ‘Big Data’

 

How to Easily Deploy MongoDB in the Cloud

Monday, February 3rd, 2014 by

GoGrid has just released it’s 1-Button Deploy™ of MongoDB, available to all customers in the US-West-1 data center. This technology makes it easy to deploy either a development or production MongoDB replica set on GoGrid’s high-performance infrastructure. GoGrid’s 1-Button Deploy™ technology combines the capabilities of one of the leading NoSQL databases with our expertise in building high-performance Cloud Servers.

MongoDB is a scalable, high-performance, open source, structured storage system. MongoDB provides JSON-style document-oriented storage with full index support, sharding, sophisticated replication, and compatibility with the MapReduce paradigm. MongoDB focuses on flexibility, power, speed, and ease of use. GoGrid’s 1-Button Deploy™ of MongoDB takes advantage of our SSD Cloud Servers while making it easy to deploy a fully configured replica set.

Why GoGrid Cloud Servers?

SSD Cloud Servers have several high-performance characteristics. They all come with attached SSD storage and large available RAM for the high I/O uses common to MongoDB. MongoDB will attempt to place its working set in memory, so the ability to deploy servers with large available RAM is important. Plus, whenever MongoDB has to write to disk, SSDs provide for a more graceful transition from memory to disk. SSD Cloud Servers use a redundant 10-Gbps public and private network to ensure you have the maximum bandwidth to transfer your data. You can use can GoGrid’s 1-Button Deploy™ to provision either a 3-server development replica set or a 5-server production replica set with Firewall Service enabled.

Development Environments

The smallest recommended size for a development replica set is 3 servers. Although it’s possible to run MongoDB on a single server, you won’t be able to test failover or how a replica set behaves in production. You’ll most likely have a small working set so you won’t need as much RAM, but will still benefit from SSD storage and a fast network.

(more…) «How to Easily Deploy MongoDB in the Cloud»

Big Data Cloud Servers for Hadoop

Monday, January 13th, 2014 by

GoGrid just launched Raw Disk Cloud Servers, the perfect choice for your Hadoop data node. These purpose-built Cloud Servers run on a redundant 10-Gbps network fabric on the latest Intel Ivy Bridge processors. What sets these servers apart, however, is the massive amount of raw storage in JBOD (Just  a Bunch of Disks) configuration. You can deploy up to 45 x 4 TB SAS disks on 1 Cloud Server.

These servers are designed to serve as Hadoop data nodes, which are typically deployed in a JBOD configuration. This setup maximizes available storage space on the server and also aids in performance. There are roughly 2 cores allocated per spindle, giving these servers additional MapReduce processing power. In addition, these disks aren’t a virtual allocation from a larger device. Each volume is actually a dedicated, physical 4 TB hard drive, so you get the full drive per volume with no initial write penalty.

Hadoop in the cloud

Most Hadoop distributions call for a name node supporting several data nodes. GoGrid offers a variety of SSD Cloud Servers that would be perfect for the Hadoop name node. Because they are also on the same 10-Gbps high-performance fabric as the Raw Disk Cloud Servers, SSD servers provide low latency private connectivity to your data nodes. I recommend using at least the X-Large SSD Cloud Server (16 GB RAM), although you may need a larger server, depending on the size of your Hadoop cluster. Because Hadoop stores metadata in memory, you’ll want more RAM if you have a lot of files to process. You can use any size Raw Disk Cloud Server, but you’ll want to deploy at least 3. Also, each Raw Disk Cloud Server has a different allocation of raw disks, which are illustrated in the table below. The Cloud Server in the illustration is the smallest size that has multiple disks per Cloud Server. Hadoop defaults to a replication factor of three, so to protect your data from failure, you’ll want to have at least 3 data nodes to distribute data. Although Hadoop attempts to replica data to different racks, there’s no guarantee that your Cloud Servers will be on different racks.

Note that the example below is for illustrative purposes only and is not representative of a typical Hadoop cluster; for example, most Cloudera and Hortonworks sizing guides start at 8 nodes. These configurations can differ greatly depending on if you intend to use the cluster for development, production, or production with HBase added. This includes the RAM and disk sizes (less of both for development, most likely more for HBase). Plus, if you’re thinking of using these nodes for production, you should consider adding a second name node.

Hadoop-cluster (more…) «Big Data Cloud Servers for Hadoop»

How To Successfully Implement a Big Data Project in 8 Steps

Monday, October 28th, 2013 by

There are countless ways to incorporate Big Data to improve your company’s operations. But the hard truth is that there’s no one-size-fits-all approach when it comes to Big Data. Beyond understanding your infrastructure requirements, you still need to create an implementation plan to understand what each Big Data project will mean to your organization. At a minimum, that plan should include the following 8 steps.

Big-Data-Cloud

Step 1: Gain executive-level sponsorship

Big Data projects need to be proposed and fleshed out. They take time to scope, and without executive sponsorship and a dedicated project team, there’s a good chance they’ll fail.

Step 2: Augment rather than re-build

Start with your existing data warehouse. Your challenge is to identify and prioritize additional data sources and then determine the right hub-and-spoke technology. At this stage, you’ll want to get approval to evaluate a few options until you settle on the appropriate technology for your needs. (more…) «How To Successfully Implement a Big Data Project in 8 Steps»

The 2013 Hadoop Summit

Monday, July 29th, 2013 by

hadoop_summit_logo

I recently attended the Hadoop Summit in San Jose. This is one of two major conferences organized around Hadoop, the other being Hadoop World. Nearly all the companies with Hadoop distributions were present along with several big users of Hadoop like Netflix, Twitter, and Linkedin.

Crossing The Chasm

If you’re not deeply involved with Hadoop, attending one of these conferences a year apart can be shocking. The advancements made in just the span of a year are amazing. The conference seemed notably larger this year, and I noticed more non-tech companies in the audience. I think it’s safe to say that Hadoop has crossed the chasm, at least for enterprise IT users.

Other than the type of attendees at the event, the other signal to me was the emergence of Hadoop 2.0. This second version of Hadoop focused on features that are important for users who want to run production-grade software for mission-critical systems. High-availability finally arrived for the name node (for the Open Source project, not the version Cloudera released for its distribution), a new version of Hive with more SQL-friendly features, and YARN which allows users to run just about anything on the Hadoop Distributed File System (HDFS). These types of stability and availability features tend to show up when there is a critical mass of users who want to use software for production.

Hadoop_0790

Quite A YARN

(more…) «The 2013 Hadoop Summit»

Advertising in the Cloud

Thursday, May 2nd, 2013 by

If you’re an online advertising company, you know how important it is to have infrastructure that performs and is resilient, reliable, and available globally. You want to spend your time optimizing your ad delivery across the world, developing your delivery platform, and not on worrying about whether your infrastructure can deliver your content quickly and accurately.

We get advertising. We all click on an online ad or read the messaging or watch the videos. We have customers that are pushing the technology envelope to deliver their advertising to end users. And many of our advertising customers have complex cloud infrastructure powering their platforms.

Advertising in the Cloud - Ad Network architecture

From Big Data architectures to Content Delivery Networks (CDNs) to multi-data-center deployments, our solutions are carefully designed to meet the unique needs of advertising providers. And although you could do it all yourself in our global cloud, we view ourselves as your partner. Our Solutions Architects are available to help you identify the best services we provide for crafting your advertising delivery platform. Remember: You need to design your infrastructure to take advantage of the benefits of cloud computing, and we believe you shouldn’t go at it alone.

Download Gartner’s “The Future of the Mobile Cloud”  for free which discusses how mobile and cloud are experiencing explosive growth! [download]

Three Advertising Leaders

The network architecture charts above are actual representations of 3 of our advertising customers, specifically Brilig (Merkel), Excite Digital Media, and Martini Media. The case studies below discuss the unique challenges each of them faced and how we worked together to develop powerful cloud solutions.

(more…) «Advertising in the Cloud»