Archive for the ‘Big Data’ Category


HBase Made Simple

Wednesday, April 30th, 2014 by

GoGrid has just released its 1-Button Deploy™ of HBase, available to all customers in the US-West-1 data center. This technology makes it easy to deploy either a development or production HBase cluster on GoGrid’s high-performance infrastructure. GoGrid’s 1-Button Deploy™ technology combines the capabilities of one of the leading NoSQL databases with our expertise in building high-performance Cloud Servers.

HBase is a scalable, high-performance, open-source database. HBase is often called the Hadoop distributed database – it leverages the Hadoop framework but adds several capabilities such as real-time queries and the ability to organize data into a table-like structure. GoGrid’s 1-Button Deploy™ of HBase takes advantage of our SSD and Raw Disk Cloud Servers while making it easy to deploy a fully configured cluster. GoGrid deploys the latest Hortonworks’ distribution of HBase on Hadoop 2.0. If you’ve ever tried to deploy HBase or Hadoop yourself, you know it can be challenging. GoGrid’s 1-button Deploy™ does all the heavy lifting and applies all the recommended configurations to ensure a smooth path to deployment.

Why GoGrid Cloud Servers?

SSD Cloud Servers have several high-performance characteristics. They all come with attached SSD storage and large available RAM for the high I/O uses common to HBase. The Name Nodes benefit from the large RAM options available on SSD Cloud Servers and the Data Nodes use our Raw Disk Cloud Servers, which are configured as JBOD (Just a Bunch of Disks). This is the recommended disk configuration for Data Nodes, and GoGrid is one of the first providers to offer this configuration in a Cloud Server. Both SSD and Raw Disk Cloud Servers use a redundant 10-Gbps public and private network to ensure you have the maximum bandwidth to transfer your data. Plus, the cloud makes it easy to add more Data Nodes to your cluster as needed. You can use GoGrid’s 1-Button Deploy™ to provision either a 5-server development cluster or an 11-server production cluster with Firewall Service enabled.

Development Environments

The smallest recommended size for a development cluster is 5 servers. Although it’s possible to run HBase on a single server, you won’t be able to test failover or how data is replicated across nodes. You’ll most likely have a small database so you won’t need as much RAM, but will still benefit from SSD storage and a fast network. The Data Nodes use Raw Disk Cloud Servers and are configured with a replication factor of 3.

(more…) «HBase Made Simple»

How can businesses make the most of their data?

Thursday, April 24th, 2014 by

When businesses attempt to harness Big Data, they’re looking to obtain actionable intelligence that can influence key business decisions. A variety of tools to do so are now available, but executives often get lost in the process of selecting which program would best suit their requirements. If a company needs to determine how a specific action will affect a particular industry, predictive analytics is probably the right choice for them. If a merchandiser wants to figure out how a single customer interacts with its brand, then descriptive tools may be the best option.

Organizing a plan to satisfy a customer.

Organizing a plan to satisfy a customer.

Know what you’re working with
Trying to draw conclusions from raw data aggregated onto cloud servers is both inefficient and ineffective. A company could collect all the data it wants, but if there’s no way of managing and segregating the information, then hastily made conclusions could send the company in the wrong direction. In addition, how professionals perceive the intelligence should not be manipulated by how they want to interpret it.

When it comes to understanding data, an open mind is mandatory. If tailored data displays a slight or entirely different angle on a particular situation, it’s better for management to adjust their plans according to the information as opposed to distorting the meaning of the digital information so that it better coincides with an original business strategy.

Interpreting phenomenon
Ultimately, data analytics gives C-suite professionals the ability to navigate through previously undecipherable patterns. ITWeb contributor Goran Dragosavac stated that there are three primary kinds of intelligence scrutiny platforms that draw considerably different conclusions from a single marketplace. Depending on what kind of business a particular company is in, the usefulness of each platform may vary significantly.

1. Predictive analytics examines the events of the past and present to determine which events will most likely transpire in the future. How can the current actions of a company manipulate the outcome? What should the business do to change the end result?

(more…) «How can businesses make the most of their data?»

How Public Organizations Should Treat Big Data

Tuesday, April 22nd, 2014 by

Though the “only human” argument certainly doesn’t apply to Big Data, enterprises and public organizations often expect too much out of the technology. Some executives are frustrated by results that don’t necessarily correlate with their predetermined business plans, and others consider one-time predictive conclusions to be final. The problem is, there’s no guarantee that analytical results will be “right.”

A government-themed action key

A government-themed action key

Public authorities interested in integrating Big Data into their cloud servers need to understand two things. First, digital information possess no political agenda, lacks emotion, and perceives the world in a completely pragmatic manner. And second, data changes as time progresses. For example, just because a county in Maine experienced a particularly rainy Spring doesn’t mean that farming soil will remain moist — future weather conditions may drastically manipulate the environment.

Benefiting from “incorrect” data
If a data analysis program harvests information from one source over the course of 1 hour and then attempts to develop conclusions, the system’s deductions will be correct to the extent that it accurately translated ones and zeroes into actionable intelligence. However, because the place from which the data was aggregated continues to produce new, variable knowledge, it may eventually contradict the original deduction.

Tim Hartford, a contributor to Financial Times, cited Google’s use of predictive analytics tools to chart how many people would be affected by influenza by using algorithms to scrutinize over 50 million search terms. The problem was, 4 years after the project was underway, the company’s system was disenfranchised by the Center for Disease Control and Prevention’s recent aggregation of data, showing that Google’s estimates of the spread of flu-like illnesses were overstated by a 2:1 ratio.

Taking the good with the bad
Although Hartford exemplified Google’s failure as a way of implying that Big Data isn’t what software developers are claiming it to be, Forbes contributor Adam Ozimek noted that the study displayed one of the advantages of the technology: The ability to reject conclusions due to consistently updated information. Furthermore, it’s important to note that Google only collected intelligence from one source, whereas the CDC was amassing data from numerous resources.

(more…) «How Public Organizations Should Treat Big Data»

Comparing Cloud Infrastructure Options for Running NoSQL Workloads

Friday, April 11th, 2014 by

A walk through in-memory, general compute, and mass storage options for Cassandra, MongoDB, Riak, and HBase workloads

I recently had the pleasure of attending Cassandra Tech Day in San Jose, a developer-focused event where people were learning about various options for deploying Cassandra clusters. As it turns out, there was a lot of buzz surrounding the new in-memory option for Cassandra and the use cases for it. This interest got me thinking about how to map the options customers have for running Big Data across clouds.

For a specific workload, NoSQL customers may want to have the following:

1. Access to mass storage servers for files and objects (not to be confused with block storage). Instead, we’re talking on-demand access to terabytes of raw spinning disk volumes for running a large storage array (think storage hub for Hadoop/HBase, Cassandra, or MongoDB).

2. Access to High RAM options for running in-memory with the fastest possible response times—the same times you’d need when running the in-memory version of Cassandra or even running Riak or Redis in-memory.

3. Access to high-performance SSDs to run balanced workloads. Think about what happens after you run a batch operation. If you’re relating information back to a product schema, you may want to push that data into something like PostgrSQL, SQL, or even MySQL and have access to block storage.

4. Access to general-purpose instances for dev and test or for workloads that don’t have specific performance SLAs. This ability is particularly important when you’re trialing and evaluating a variety of applications. GoGrid’s customer’s, for example, leverage our 1-Button Deploy™ technology to quickly spin up dev clusters of common NoSQL solutions from MongoDB to Cassandra, Riak, and HBase.

(more…) «Comparing Cloud Infrastructure Options for Running NoSQL Workloads»

Be Prepared with a Solid Cloud Infrastructure

Thursday, April 10th, 2014 by

The more Big Data enterprises continue to amass, the more potential risk is involved. It would be one matter if it was simply raw material without any clearly defined meaning; however data analytics tools—combined with the professionalism of tech-savvy employees—allow businesses to harvest profit-driving, actionable digital information.

Recovery disks shattering

Compared to on-premise data centers, cloud computing offers multiple disaster recovery models.

Whether the risk is from a a cyber-criminal who gains access to a database or a storm that cuts power, it’s essential for enterprises to have a solid disaster recovery plan in place. Because on-premise data centers are prone to outages in the event of a catastrophic natural event, cloud servers provide a more stable option for companies requiring constant access to their data. Numerous deployment models exist for these systems, and most of them are constructed based on how users interact with them.

How the cloud can promote disaster recovery 
According to a report conducted by InformationWeek, only 41 percent of respondents to the magazine’s 2014 State of Enterprise Storage Survey stated they have a disaster recovery (DR) and business continuity protocol and regularly test it. Although this finding expresses a lack of preparedness by the remaining 59 percent, the study showed that business leaders were beginning to see the big picture and placing their confidence in cloud applications.

The source noted that cloud infrastructure and Software-as-a-Service (SaaS) automation software let organizations  deploy optimal DR without the hassle associated with a conventional plan. Traditionally, companies backed up their data on physical disks and shipped them to storage facilities. This method is no longer workable because many enterprises are constantly amassing and refining new data points. For example, Netflix collects an incredible amount of specific digital information on its subscribers through its rating system and then uses it to recommend new viewing options.

The news source also acknowledged that the issue isn’t just about recovering data lost during the outage, but about being able to run the programs that process and interact with that information. In fact, due to the complexity of these infrastructures, many cloud hosts offer DR-as-a-Service.

(more…) «Be Prepared with a Solid Cloud Infrastructure»