Posts Tagged ‘NOSQL’

 

How to Easily Deploy MongoDB in the Cloud

Monday, February 3rd, 2014 by

GoGrid has just released it’s 1-Button Deploy™ of MongoDB, available to all customers in the US-West-1 data center. This technology makes it easy to deploy either a development or production MongoDB replica set on GoGrid’s high-performance infrastructure. GoGrid’s 1-Button Deploy™ technology combines the capabilities of one of the leading NoSQL databases with our expertise in building high-performance Cloud Servers.

MongoDB is a scalable, high-performance, open source, structured storage system. MongoDB provides JSON-style document-oriented storage with full index support, sharding, sophisticated replication, and compatibility with the MapReduce paradigm. MongoDB focuses on flexibility, power, speed, and ease of use. GoGrid’s 1-Button Deploy™ of MongoDB takes advantage of our SSD Cloud Servers while making it easy to deploy a fully configured replica set.

Why GoGrid Cloud Servers?

SSD Cloud Servers have several high-performance characteristics. They all come with attached SSD storage and large available RAM for the high I/O uses common to MongoDB. MongoDB will attempt to place its working set in memory, so the ability to deploy servers with large available RAM is important. Plus, whenever MongoDB has to write to disk, SSDs provide for a more graceful transition from memory to disk. SSD Cloud Servers use a redundant 10-Gbps public and private network to ensure you have the maximum bandwidth to transfer your data. You can use can GoGrid’s 1-Button Deploy™ to provision either a 3-server development replica set or a 5-server production replica set with Firewall Service enabled.

Development Environments

The smallest recommended size for a development replica set is 3 servers. Although it’s possible to run MongoDB on a single server, you won’t be able to test failover or how a replica set behaves in production. You’ll most likely have a small working set so you won’t need as much RAM, but will still benefit from SSD storage and a fast network.

(more…) «How to Easily Deploy MongoDB in the Cloud»

Create a Basho Riak Cluster on GoGrid

Monday, July 9th, 2012 by

Basho is a GoGrid partner and responsible for the open-source Riak project. If you are not familiar with Riak, it is a well regarded open-source distributed database. It was built off of the Dynamo concept so it is often compared to Cassandra and Amazon Dynamo DB.

Riak is used as a fast, fault-tolerant distributed database. Companies like Mozilla use it for storing and analyzing beta testing results. Mozilla needed a solution to help improve the user experience and that would allow them to store large amounts of data very quickly. Another example of a company using Riak is Bump which uses Riak to scale and manage massive amounts of data sent between it’s millions of users. Riak is used to store elements of past user conversations so that communication history is readily accessible to users.

basho_logo2

Basho Riak version 1.1.4 is now available as a GoGrid Community Server Image (CGSI). You can find it when you launch a virtual machine and search for “Riak”. This image is available in all our data centers. This CGSI contains the open source version so support is only available via the community site and will not have all the features present in the Enterprise version. However, you can use this image to either run a proof of concept (POC) of Riak to see if it will meet your needs or to run a small cluster. These will run on GoGrid’s high performance VMs which have been shown to have significant performance advantages over other cloud implementations.

Riak_image

Why is GoGrid faster?

(more…) «Create a Basho Riak Cluster on GoGrid»

The Big Data Revolution – Part 2 – Enter the Cloud

Wednesday, March 21st, 2012 by

In Part 1 of this Big Data series, I provided a background on the origins of Big Data.

But What is Big Data?

Port Vell Barcelona

The problem with using the term “Big Data” is that it’s used in a lot of different ways. One definition is that Big Data is any data set that is too large for on-hand data management tools. According to Martin Wattenberg, a scientist at IBM, “The real yardstick … is how it [Big Data] compares with a natural human limit, like the sum total of all the words that you’ll hear in your lifetime.” Collecting that data is a solvable problem, but making sense of it, (particularly in real time), is the challenge that technology tries to solve. This new type of technology is often listed under the title of “NoSQL” and includes distributed databases that are a departure from relational databases like Oracle and MySQL. These are systems that are specifically designed to be able to parallelize compute, distribute data, and create fault tolerance on a large cluster of servers. Some examples of NoSQL projects and software are: Hadoop, Cassandra, MongoDB, Riak and Membase.

The techniques vary, but there is a definite distinction between SQL relational databases and their NoSQL brethren. Most notably, NoSQL systems share the following characteristics:

  • Do not use SQL as their primary query language
  • May not require fixed table schemas
  • May not give full ACID guarantees (Atomicity, Consistency, Isolation, Durability)
  • Scale horizontally

Because of the lack of ACID, NoSQL is used when performance and real-time results are more important than consistency. For example, if a company wants to update their website in real time based on an analysis of the behaviors of a particular user interaction with the site, they will most likely turn to NoSQL to solve this use case.

However, this does not mean that relational databases are going away. In fact, it is likely that in larger implementations, NoSQL and SQL will function together. Just as NoSQL was designed to solve a particular use case, so do relational databases solve theirs. Relational databases excel at organizing structured data and is the standard for serving up ad-hoc analytics and business intelligence reporting. In fact, Apache Hadoop even has a separate project called Sqoop that is designed to link Hadoop with structured data stores. Most likely, those who implement NoSQL will maintain their relational databases for legacy systems and for reporting off of their NosQL clusters.

(more…) «The Big Data Revolution – Part 2 – Enter the Cloud»

The Big Data Revolution – Part 1 – The Origins

Tuesday, March 20th, 2012 by

data-security

For many years, companies collected data from various sources that often found its way to relational databases like Oracle and MySQL. However, the rise of the internet and Web 2.0, and recently social media began not only an enormous increase in the amount of data created, but also in the type of data. No longer was data relegated to types that easily fit into standard data fields – it now came in the form of photos, geographic information, chats, Twitter feeds and emails. The age of Big Data is upon us.

A study by IDC titled “The Digital Universe Decade” projects a 45-fold increase in annual data by 2020. In 2010, the amount of digital information was 1.2 zettabytes. 1 zettabyte equals 1 trillion gigabytes. To put that in perspective, the equivalent of 1.2 zettabytes is a full-length episode of “24” running continuously for 125 million years, according to IDC. That’s a lot of data. More importantly, this data has to go somewhere, and this report projects that by 2020, more than 1/3 of all digital information created annually will either live in or pass through the cloud. With all this data being created, the challenge will be to collect, store, and analyze what it all means.

Business intelligence (BI) systems have always had to deal with large data sets. Typically the strategy was to pull in “atomic” -level data at the lowest level of granularity, then aggregate the information to a consumable format for end users. In fact, it was preferable to have a lot of data since you could also “drill-down” from the aggregation layer to get at the more detailed information, as needed.

Large Data Sets and Sampling

Coming from a data background, I find that dealing with large data sets is both a blessing and a curse. One product that I managed analyzed share of wireless numbers. The number of wireless subscribers in 2011 according to CTIA was 322.9 million and growing. While that doesn’t seem like a lot of data at first, if each wireless number was a unique identifier, there could be any number of activities associated with each number. Therefore the amount of information generated from each number could be extensive, especially as the key element was seeing changes over time. For example, after 2003, mobile subscribers in the United States were able to port their numbers from one carrier to another. This is of great importance to market research since a shift from one carrier to another would indicate churn and also impact the market share of carriers in that Metropolitan Statistical Area (MSA).

Given that it would take a significant amount of resources to poll every household in the United States, market researchers often employ a technique called sampling. This is a statistical technique where a panel that represents the population is used to represent the activity of the overall population that you want to measure. This is a sound scientific technique if done correctly but its not without its perils. For example, it’s often possible to get +/- 1% error at 95% confidence for a large population but what happens once you start drilling down into more specific demographics and geographies? The risk is not only having enough sample (you can’t just have one subscriber represent the activity of a large group for example) but also ensuring that it is representative (is the subscriber that you are measuring representative of the population that you want to measure?). It’s a classic problem of using panelists that sampling errors do occur. It’s fairly difficult to be completely certain that your sample is representative unless you’ve actually measured the entire population already (using it as a baseline) but if you’ve already done that, why bother sampling?

(more…) «The Big Data Revolution – Part 1 – The Origins»

2012 Cloud Computing Predictions from GoGrid Executives, Customers & Partners (Part 1)

Tuesday, January 17th, 2012 by

As is customary with the passing of an old year and the exciting entrance into a new one, people try to make their best predictions as to what the future holds within their area of expertise. For GoGrid, this is obviously around Cloud Computing. This year, instead of making my own prediction list (as I have done in the past), I thought it would be important to get some other expert voices from the GoGrid and cloud community to do this task.

2012-cloud-year-pt1

The important thing to always remember here, especially when dealing with the cloud, is that it changes quickly. It’s similar to buying the latest technology, the moment you buy it (or make the prediction, in this case), it’s instantly outdated. But still, the process is fun if not, educational.

Below is a compilation of 2012 cloud computing predictions from a variety of subject matter experts and thought-leaders in the field of cloud infrastructure, security and services. The contributors are:

  • Warren Heffelfinger (CEO – GoGrid)
  • James Urquhart (Cloud Writer – GigaOm/VP of Product Strategies – enStratus/GoGrid Partner)
  • Larry Warnock (CEO – Gazzang/GoGrid Partner)
  • John Keagy (Chairman & Founder – GoGrid)
  • Carson Sweet (CEO – CloudPassage/GoGrid Partner)
  • Antonio Piraino (CTO – ScienceLogic/GoGrid Customer)

Because of the wealth of knowledge coming from this group, I have actually broken this article out into a series of 2 posts. Without further ado, onto the first set of predictions!

(more…) «2012 Cloud Computing Predictions from GoGrid Executives, Customers & Partners (Part 1)»