KML_FLASHEMBED_PROCESS_SCRIPT_CALLS
 

Create a Basho Riak Cluster on GoGrid

July 9th, 2012 by - 7,200 views

Basho is a GoGrid partner and responsible for the open-source Riak project. If you are not familiar with Riak, it is a well regarded open-source distributed database. It was built off of the Dynamo concept so it is often compared to Cassandra and Amazon Dynamo DB.

Riak is used as a fast, fault-tolerant distributed database. Companies like Mozilla use it for storing and analyzing beta testing results. Mozilla needed a solution to help improve the user experience and that would allow them to store large amounts of data very quickly. Another example of a company using Riak is Bump which uses Riak to scale and manage massive amounts of data sent between it’s millions of users. Riak is used to store elements of past user conversations so that communication history is readily accessible to users.

basho_logo2

Basho Riak version 1.1.4 is now available as a GoGrid Community Server Image (CGSI). You can find it when you launch a virtual machine and search for “Riak”. This image is available in all our data centers. This CGSI contains the open source version so support is only available via the community site and will not have all the features present in the Enterprise version. However, you can use this image to either run a proof of concept (POC) of Riak to see if it will meet your needs or to run a small cluster. These will run on GoGrid’s high performance VMs which have been shown to have significant performance advantages over other cloud implementations.

Riak_image

Why is GoGrid faster?

The secret sauce is the architecture. GoGrid is architected from the beginning for performance. Unlike other cloud providers, GoGrid virtual servers use local storage which creates a low latency, high I/O environment. This has been proven to be a performance advantage when compared to virtual servers that use attached storage – the network latency and oversubscribed nature of that shared storage becomes a detriment to performance. Since each GoGrid virtual machine has its own allocated local storage – it doesn’t suffer from the “noisy neighbor.” In addition, each virtual machine that is spun up is assigned to a different physical node. This does three things for users who have deployed a Riak cluster:

  1. It distributes the risk – if one physical node should fail, only one VM in the cluster will go down and the other nodes in the cluster will be available.
  2. It distributes the compute, network and storage. Even if one node should become inaccessible, it won’t impact the entire cluster.
  3. All the VMs are still on the same VLAN – this ensures easy network connectivity among the members of the cluster.

Big-Data-network

How to deploy Riak

One of the things that I like about Riak is that it is easy to deploy and works well in multi-tenant environments like the GoGrid cloud. You will notice that the smallest server you can select is 4GB. This is the recommended minimum RAM size – it also ensures that you will have 200GB of local storage from which to use. You can also select more RAM but make sure to select the same size for all the servers in your cluster.

Once you have the first machine deployed, login via SSH and change your password. If you just want to experiment with Riak – this all you need to do. The Riak process just needs to be started and you can start testing out some commands and features. However, to see the true abilities of Riak, you really need to setup a cluster. The bare minimum setup is to have 3 nodes. However, I recommend an n+2 strategy – n being the replication level if you are considering this cluster for production. Out of the box, Riak is configured to have a replication level of 3. This ensures that there are at least 2 other nodes with copies of your data. However, when you do n+2 (or 5 nodes when n=3) you gain several benefits. Since Riak scales linearly, you should see improved performance with the introduction of the additional nodes. It also provides for additional fault tolerance – with a great number of nodes this ensures no single node holds more than one copy of any particular piece of data.

Configuring a cluster

The first step is to modify your private interface to use a static IP. This makes it much easier to configure a cluster. Use your favorite text editor and modify the private interface file.

#nano /etc/sysconfig/network-scripts/ifcfg-eth1

With in the interface file make the following changes:

DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.100.10.3 [a private IP in your subnet – this IP is just an example]
NETMASK=255.255.255.0

Restart the interface with the following command:

#ifup eth1

Launch four additional servers using the Riak CGSI and modify the private interface file (using different private IPs, of course).

For each node, you will need to configure /etc/riak/app.config in order for the nodes in cluster to talk to each other. You will need to do the following to each node in the cluster.

Stop Riak if it is currently running:

#riak stop

Use your favorite text editor to modify /etc/riak/app.config. You will need to modify the default IP address to use the static private IP that you just set in the previous step. The section to modify has a default setting of:

{http, [ {"127.0.0.1", 8098 } ]},

The IP is this example below is not real – use the private IP that you configured on this machine earlier. It should be an IP in your private subnet and not used by another machine.

riak_config1

Save your changes. Next modify the /etc/riak/vm.args file. You will want to modify the first option in the file which is set to:

-name riak@127.0.0.1

Change the IP to the static IP that was defined earlier.

riak_config2

Configuring the other nodes

Follow the steps above with the other nodes.

  1. Launch the VM using the Riak image
  2. Set a private IP by modifying the ifcfg-eth1 interface file
  3. Make sure to stop Riak (#riak stop)
  4. Modify the /etc/riak/app.config file to use the private IP of the server
  5. Modify the/etc/riak/vm.args file to use the private IP of the server
  6. Start Riak (#riak start)

Adding nodes to the cluster

So far, all you’ve done is the prep work. Setting the IP address and the config files is needed before nodes can be added to the cluster. Now that has been done on all the nodes, go back to your first node and run the following command (note that in this example, 10.100.10.3 is the first node that you configured and 10.100.10.4 is the second node).

#riak-admin join riak@10.100.10.4

Since Riak doesn’t have a concept of a master node, all the nodes function equally in the cluster. It doesn’t matter which server you first log into or which one you choose to join. However, for efficiency, it probably makes sense to select one server that you will always join to (10.100.10.4 in the example above) so that you can just run the same command across all the nodes. So in the example above, you will want to login to and then run that join command on every node other than 10.100.10.4. After you have run the command on all your nodes (except the node that you are joining), you can verify that the nodes are all part of the cluster.

#riak-admin member_status

Attempting to restart script through sudo -u riak

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid      50.0%      --      'riak@10.100.10.3'
valid      50.0%      --      'riak@10.100.10.4'
-------------------------------------------------------------------------------
Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

This command shows all the members in the cluster (typically called a ring). I only ran this command once so it only shows two servers in the ring. As you add more nodes to the ring, more nodes will appear.  And you’re done! That’s all you need to do to setup your Riak ring. If you want to maintain your Ring using a UI, I recommend enabling Riak Control – it’s already installed on the the binary, you just to modify the config files to enable it. You just need to configure it on one node of the ring in order to use it to manage the entire ring. You can see details here: http://wiki.basho.com/Riak-Control.html. If you need further information on more advanced features of Riak, check out: http://wiki.basho.com/Riak.html.

Riak is key-value store at its core so you will most likely use your cluster for use cases that support that model. So it’s not necessarily a strong fit for ad-hoc querying or heavy analytics. Good use cases are:

  • Session Storage
  • User Data Storage
  • Scalable, low-latency storage for mobile apps
  • Critical data storage for medical data (a great example is how the Danish government uses Riak for their medical prescription program)
  • Building block for a custom distributed system

If you are looking for a powerful, highly available key-value store on a cost effective cloud platform, then look no further than running your Basho Riak cluster on GoGrid’s virtual servers!

The following two tabs change content below.

Rupert Tagnipes

Director, Product Management at GoGrid
Rupert Tagnipes is Director of Product Management at GoGrid who is responsible for managing and expanding the company’s multiple product lines. His focus is on leveraging his technical background and industry knowledge to drive product innovation and increase adoption of the cloud. He has extensive software product experience at technology companies in Silicon Valley solving data analytics and cloud infrastructure problems for customers across multiple industries.

Latest posts by Rupert Tagnipes (see all)

Leave a reply