KML_FLASHEMBED_PROCESS_SCRIPT_CALLS
 

How to Set Up a Gluster File System within the GoGrid Cloud (Part 1)

August 19th, 2011 by - 7,987 views

In this blog post series, I want to take a closer look at a storage technology called Gluster File System, and how it can be set up (this article), connected to (article #2) and expand storage (article #3). This is the first blog post of the series and I will review what GlusterFS is, why you would consider using it, and how to deploy it using the GoGrid GlusterFS Partner GSI.

image

GoGrid offers a great storage solution called Cloud Storage. But what if you want to deploy your own storage so that you can directly control performance and redundancy? What software would you use to provide this? The simple answer is Gluster. It is a powerful software-based storage solution that offers a centralized controlled storage pool management system that is very easy to use.

There are many different ways to take advantage of the GlusterFS storage solution. (Note: in the descriptions below a “brick” is a GoGrid Virtual Server.)

1. Distributed Volumes:

“Distributed volumes distribute files throughout the bricks in the volume. You can use distributed volumes where the requirement is to scale storage and the redundancy is either not important or is provided by other hardware/software layers.” – Gluster.org

2. Replicated Volumes:

“Replicated volumes replicate files throughout the bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.” – Gluster.org

3. Striped Volumes:

“Stripes data across bricks in the volume. For best results, you should use striped volumes only in high concurrency environments accessing very large files.”

These storage volume options seem very familiar, don’t they? Well, if you are familiar with the different RAID configurations of hard drives in server deployments, you will notice similarities with these options. For example, the “Distributed Volume” for Gluster is essentially a RAID 0. You sacrifice redundancy to gain superior performance and ease of capacity scaling.

The Replicated Volume is similar to a RAID 10 or RAID 1 where data integrity, redundancy and reliability are very important. However, the cost to scale is more since you need to basically add GoGrid Virtual Servers (bricks) in pairs to maintain the Replicated Volume structure.

The Striped Volume is similar to RAID 5 where data is striped across the GoGrid Virtual Servers (bricks). This comes in very handy when you are dealing with very large files (multiple GB files) and when the file is accessed multiple servers will stream the data to the web-server needing the file – offering very fast reads.

For my blog post, I am going to configure a 4 server Distributed Volume Gluster setup using the GoGrid Gluster Partner Image. I am going to deploy 4 x 8GB Gluster servers. Each Gluster server will have 384GB of storage available. In the Distributed Volume setup (similar to RAID 10), I will have 384GB x2 worth of space equaling approximately 768GB of usable space.

First step is to deploy the 4 new GoGrid Gluster Virtual Servers using the GoGrid Partner GSI. I log into my portal and then follow the next steps:

1. Click “Add”

Add_Button

2. Choose “Cloud Server”

Add_Cloud Server

3. Filter for “Gluster” & choose that image

Select_Gluster_Image

4. Accept the Terms

Partner_Image_Agreement

5. Fill in the server information (name, public IP, description, memory allotment)

Gluster_Server_Information_Save

6. Repeat this process 3 more time but using different server name and public IP address.

Once you have all 4 of your new Gluster servers deployed, you can then view the Support → Passwords page in your portal to find the login information. With this login information, you can run this command from your local Linux workstation to change the hostname, set the private IP address and reboot each system. Run the following Bash script from your Linux workstation. The script will prompt you for the server address and root login, and also ask for the hostname and private IP address/netmask you want to use. If you don’t want to use this script, simply log into each system manually, update the host names and private IP addresses, and then reboot.

https://github.com/sepulworld/Remote_Linux_System_Update/blob/master/system_update.sh

I should now be able to log into all 4 systems and see the appropriate hostnames and IPs on each.

Gluster_4_systems

This looks good – if you don’t see the right hostnames or IPs on one or more of the systems, double check what is configured in the /etc/sysconfig/network file and in the /etc/sysconfig/network-scripts/ifcfg-eth1 file. Also, confirm if your host performed the intended reboot (this is necessary for the host name to update at the command line).

From one of your Gluster servers, confirm private network connectivity by pinging each of the other Gluster servers via their private IP addresses. See image below.

Ping_Gluster_Systems

Once this has been confirmed, we can take a look and see if the Gluster process is already running. It is configured on this GoGrid Partner Image to start on boot.

Gluster_Process_Login

Now I need to configure the trusted server storage pool. Basically, I log into just one of my 4 Gluster servers (I choose Gluster_1) and I run a single command to put each of the other 3 members into the trusted server storage pool.

[root@Gluster_1 ~]# gluster peer probe 10.129.151.107

See image here -

Gluster_Peer_probe

Next, I run the command to create the distributed volume using my 4 Gluster servers.

command: gluster volume create DataStore1 replica 4 transport tcp 10.129.151.105:/store1 10.129.151.98:/store2 10.129.151.108:/store3 10.129.151.107:/store4

You can name the directories anything you want. I used “store1” thru “store4”. You can also name the volume whatever you would like. I choose DataStore1.

Gluster_Volume_creation

Now let’s start the Volume with one simple command: gluster volume start DataStore1

Start_Gluster_Volume

And finally let’s view the volume information: gluster volume info DataStore1

Show_Volume_Info

Helpful link:

http://gluster.com/community/documentation/index.php/Main_Page

If you run into any issues or have questions about the Gluster Partner GSI, please email gogrid-beta@gluster.com

That is it! You have successfully deployed the GoGrid Gluster servers from the GoGrid Partner GSI and configured 4 of them in a new replicated storage volume. My next blog post will cover deploying a web-server and connecting to this new storage volume. The third and final post will cover how to scale your replicated storage volume on GoGrid.

I hope you found this tutorial helpful. Stay tuned for Parts 2 and 3. Please let me know if you have any questions.

Leave a reply