Author Archive


The Big Data Revolution – Part 1 – The Origins

Tuesday, March 20th, 2012 by


For many years, companies collected data from various sources that often found its way to relational databases like Oracle and MySQL. However, the rise of the internet and Web 2.0, and recently social media began not only an enormous increase in the amount of data created, but also in the type of data. No longer was data relegated to types that easily fit into standard data fields – it now came in the form of photos, geographic information, chats, Twitter feeds and emails. The age of Big Data is upon us.

A study by IDC titled “The Digital Universe Decade” projects a 45-fold increase in annual data by 2020. In 2010, the amount of digital information was 1.2 zettabytes. 1 zettabyte equals 1 trillion gigabytes. To put that in perspective, the equivalent of 1.2 zettabytes is a full-length episode of “24” running continuously for 125 million years, according to IDC. That’s a lot of data. More importantly, this data has to go somewhere, and this report projects that by 2020, more than 1/3 of all digital information created annually will either live in or pass through the cloud. With all this data being created, the challenge will be to collect, store, and analyze what it all means.

Business intelligence (BI) systems have always had to deal with large data sets. Typically the strategy was to pull in “atomic” -level data at the lowest level of granularity, then aggregate the information to a consumable format for end users. In fact, it was preferable to have a lot of data since you could also “drill-down” from the aggregation layer to get at the more detailed information, as needed.

Large Data Sets and Sampling

Coming from a data background, I find that dealing with large data sets is both a blessing and a curse. One product that I managed analyzed share of wireless numbers. The number of wireless subscribers in 2011 according to CTIA was 322.9 million and growing. While that doesn’t seem like a lot of data at first, if each wireless number was a unique identifier, there could be any number of activities associated with each number. Therefore the amount of information generated from each number could be extensive, especially as the key element was seeing changes over time. For example, after 2003, mobile subscribers in the United States were able to port their numbers from one carrier to another. This is of great importance to market research since a shift from one carrier to another would indicate churn and also impact the market share of carriers in that Metropolitan Statistical Area (MSA).

Given that it would take a significant amount of resources to poll every household in the United States, market researchers often employ a technique called sampling. This is a statistical technique where a panel that represents the population is used to represent the activity of the overall population that you want to measure. This is a sound scientific technique if done correctly but its not without its perils. For example, it’s often possible to get +/- 1% error at 95% confidence for a large population but what happens once you start drilling down into more specific demographics and geographies? The risk is not only having enough sample (you can’t just have one subscriber represent the activity of a large group for example) but also ensuring that it is representative (is the subscriber that you are measuring representative of the population that you want to measure?). It’s a classic problem of using panelists that sampling errors do occur. It’s fairly difficult to be completely certain that your sample is representative unless you’ve actually measured the entire population already (using it as a baseline) but if you’ve already done that, why bother sampling?

(more…) «The Big Data Revolution – Part 1 – The Origins»

Spotify Music Apps Hack Weekend – Sponsored by GoGrid

Tuesday, February 28th, 2012 by


To celebrate the release of their API, Spotify sponsored a Hack-a-thon at SPiN Ping Pong Club in New York City from Friday February 24 until Sunday February 26. Spotify was joined by big brands like Doritos, CW, McDonald’s, Showtime, State Farm and Mountain Dew. Technology companies sponsoring the event included Facebook, Twilio, FourSquare, The Echo Nest and of course, GoGrid. GoGrid provided all the cloud servers for the event to support the developers as they created brand new apps using the Spotify API in conjunction with other API like Facebook’s Open Graph. GoGrid’s manager of cloud ecosystem, Paul Lancaster and I were on-hand to meet with developers and provide support for the event.


50 CentOS x64 cloud servers were provisioned to the hackers by GoGrid to build their applications free of charge with root level access for maximum flexibility. Hundreds of hackers showed up to build the next great apps and were treated to live performances by Blood Orange and MNDR. While hack-a-thons tend to have attrition over time, hackers stayed throughout the night and most for the entire weekend.

Museik App

There were roughly 30 projects worked on during the weekend which ranged from an app called Museik (UI shown above) that extracts content from the internet related to the release date of a song on Spotify to a project called Orbidal by the students of the VCU Brandcenter that gathers the collective feelings of your Facebook feed and creates a playlist based on that mood on Spotify.


(more…) «Spotify Music Apps Hack Weekend – Sponsored by GoGrid»

Riverbed Stingray 8.1 Now in the GoGrid Cloud!

Tuesday, February 7th, 2012 by

As of today, GoGrid has released multiple images of the leading software load balancer, Riverbed Stingray! The following images are available on the GoGrid Partner Exchange in both San Francisco and Amsterdam:

  • Riverbed 7.4 Simple Load Balancer 10 Mbps
  • Riverbed 8.1 Load Balancer 10 Mbps
  • Riverbed 8.1 Load Balancer 200 Mbps
  • Riverbed 8.1 Load Balancer 200 Mbps WAF

How to Configure Static Routes to Traverse Traffic on CloudLink

Wednesday, February 1st, 2012 by

CloudLink is infrastructure so it can enable many use cases. However, you will be unable to use it until you configure your servers to use static routes. The rest of this post will describe how to create a static route from one server in US-West-1 to servers in US-East-1. This assumes that you have not already assigned a private IP to the West server. This guide assumes that you have a basic knowledge of Linux and/or Windows and with the basic principles of networking.

Find your Private IPs

First, you will need to find your private IPs. You can find your private IP block by going to the GoGrid portal, selecting the List tab and then Network. Under Type: Private you will see your private IP blocks. In this example, this is a listing of private IP blocks for US-West-1. US-East-1 has a DIFFERENT private IP block. The gateway is +1 from the first number in your private IP block ( in the example above.

(more…) «How to Configure Static Routes to Traverse Traffic on CloudLink»

How To Set Up Private IP Segregation with CloudPassage in the GoGrid Cloud

Thursday, September 29th, 2011 by

CloudPassage is a key security partner that has images available on the GoGrid Partner Exchange. The CloudPassage images on GoGrid come pre-installed with their Halo daemon. This is available on CentOS, Debian, Red Hat, and Ubuntu on both 32-bit and 64-bit flavors. Alternately, you can launch a GoGrid base image and install the Halo daemon on your own. This tutorial assumes that you have a basic understanding of Linux and SSH as well as basic firewall strategies. It also assumes that you know how to configure private IPs so that will not be covered here.