We're Hiring!  
Toll Free US & Canada: 1(877) 946-4743   Worldwide: +1(415) 869-7444

Measuring the Performance of Clouds – GoGrid

Written by on Mar 17th, 2009 | Filed under: Cloud Computing, GoGrid, Storage
5,938 views

Raditha Dissanayake posted a blog entry comparing Amazon EC2 and GoGrid performance. Unfortunately, we think Raditha did not use the most rigorous methodology possible for doing his comparison. It would be inappropriate for GoGrid to performance test Amazon’s EC2. In fact, their Customer Agreement may actually make such activity questionable, but IANAL (I Am Not A Lawyer).

Let’s take a more rigorous look at GoGrid disk subsystem performance.

Framing the Issue

As a start the entire issue is a LOT more complex than can potentially be covered here. Today’s disks, hard drive controllers, and operating systems have many different kinds of caching mechanisms. In addition, virtualization systems like Xen can impact results in unexpected ways. For example, did you know that Xen can be deployed in two major manners?

Either ‘paravirtualized’ or ‘hardware virtualized’. The two different models almost certainly impact any testing methodology. And yes, you guessed it, Amazon and GoGrid don’t configure Xen in the same way. Amazon uses paravirtualization and GoGrid uses hardware virtualization. Beyond this public information neither Amazon nor GoGrid provide significant details about their infrastructure considering it, rightfully so, proprietary intellectual property.

Without a deep understanding of all of the issues it’s difficult to do a test much less a proper comparison.

But we are certain of a few very important things.

Clouds Are Multi-Tenant

First off, it’s hard to do a serious comparison like this using one server on each system. Clouds are inherently multi-tenant systems and since end users have no visibility into who else is using or sharing their disk resources at any given time there is no real way to verify that the results aren’t tainted by other activity.

Use the Right Tool

Secondly, hdparm -t isn’t a very good way to measure disk speed. It’s susceptible to noise from background activity, in fact the man page says:

-t Perform timings of device reads for benchmark and comparison purposes. For meaningful results, this operation should be repeated 2-3 times on an otherwise inactive system (no other active processes) with at least a couple of megabytes of free memory. [...]

As you can see in Raditha’s test, hdparm doesn’t really do enough I/O to get consistent results in a multi-tenant environment. In the tests, hdparm is only active for a very short period of time allowing tenancy to have a dramatic effect on the results.  hdparm requires an inactive system and since that can’t be guaranteed in the cloud it fails the sniff test for a robust tool for cloud performance testing.

Another factor here that is unaccounted for is that hdparm is a utility tuned for real physical disks, not virtual disks.

Better Measurements

Ideally if you want to measure the streaming performance of a block device in a more reliable way in a multi-tenant environment, then use a larger amount of I/O. When doing this I/O you want to try to eliminate:

  • Hard disk controller layer cache effects
  • Hard disk layer cache effects
  • OS level cache effects
  • Effects of disk activity from other VMs

All current GoGrid nodes have caches in the storage layer. These are designed to be robust and to absorb burst of write activity. These caches are sufficiently large though that if you do repetitive small I/Os what you end up measuring in the performance in pulling this data out of the storage layers caches, not from the storage itself.

To avoid OS level cache effects use ‘direct I/O’. High performance applications and databases tend to use this internally for similar reasons (because they want to avoid OS level cache pollution and do their own caching). Oracle is probably the most obvious example here.

Testing Performance

On a ‘small VM’ located on a fairly busy node:

[root@foo ~]# dd if=/dev/hda bs=10M of=/dev/null iflag=direct count=100
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 3.50983 seconds, 299 MB/s
[root@foo ~]# dd if=/dev/hda bs=10M of=/dev/null iflag=direct count=100
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 3.06811 seconds, 342 MB/s
[root@foo ~]# dd if=/dev/hda bs=10M of=/dev/null iflag=direct count=100
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 2.14147 seconds, 490 MB/s

That’s using enough I/O to minimize noise from other VM activity and large enough to avoid hitting cache effects.

If the I/O load is small enough you can hit storage layer cache effects:

[root@foo ~]# dd if=/dev/hda bs=10M of=/dev/null iflag=direct count=10
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 0.116491 seconds, 900 MB/s
[root@foo ~]# dd if=/dev/hda bs=10M of=/dev/null iflag=direct count=10
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 0.16058 seconds, 653 MB/s
[root@foo ~]# dd if=/dev/hda bs=10M of=/dev/null iflag=direct count=10
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 0.115701 seconds, 906 MB/s

While this is a fairly contrived example, it’s useful in other ways because it shows you can get very good burst throughput (consider a database updating a few thousand pages).

A larger memory instance (where average performance should be a lot better).

Sustained (large) IO:

[root@ubdev1 ~]# dd if=/dev/hda bs=10M count=100 of=/dev/null iflag=direct
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 1.80415 seconds, 581 MB/s
[root@ubdev1 ~]# dd if=/dev/hda bs=10M count=100 of=/dev/null iflag=direct
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 1.70448 seconds, 615 MB/s
[root@ubdev1 ~]# dd if=/dev/hda bs=10M count=100 of=/dev/null iflag=direct
100+0 records in
100+0 records out
1048576000 bytes (1.0 GB) copied, 1.6799 seconds, 624 MB/s

Burst (small) IO:

[root@ubdev1 ~]# dd if=/dev/hda bs=10M count=10 of=/dev/null iflag=direct
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 0.105183 seconds, 997 MB/s
[root@ubdev1 ~]# dd if=/dev/hda bs=10M count=10 of=/dev/null iflag=direct
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 0.089827 seconds, 1.2 GB/s
[root@ubdev1 ~]# dd if=/dev/hda bs=10M count=10 of=/dev/null iflag=direct
10+0 records in
10+0 records out
104857600 bytes (105 MB) copied, 0.090264 seconds, 1.2 GB/s

Don’t take my word for any of this. Try it out. If you’re really bored graph I/O performance vs I/O size and you’ll likely see a step function with a soft edge that will give you some idea of what the storage system is capable of and the degree of I/O variation.

Bottom Line

It’s great that people are kicking the tires of various clouds, but let’s be careful to make sure our testing is rigorous and makes sense for the environment.  If you have questions about how to measure performance on clouds, please send them to us.  Or if you’re a performance and virtualization system guru and have some knowledge to share, please do so.

We always want to improve our cloud and take seriously any feedback that shows a real problem, but in this case the test needs tweaking, not GoGrid.


gartner_logo This week, Gartner, Inc released their list of the top 10 Strategic Technologies for 2009. This information stems from research performed within the Technology sector and factors in their client and research feedback. This list, released at the Gartner Symposium ITxpo, is considered to be potentially “disruptive to your environment or market in some way,” says Gartner analyst David Cearley.

While I sometimes find some of Gartner’s commentary on trends in technology a bit conservative and missing other critical data, this 2009 list does represent current trends that I have seen and mirrors many of my own expectations. Just last week, TechCrunch’s Michael Arrington declared that Web 2.0 was dead. I think that many of us have already moved well beyond Web 2.0. My view, for some time, has been that Web 3.0 (for lack of a better term) will be a combination of Integration and Standards and the coupling of the two, with other enabling technologies such as Cloud Computing providing the necessary lubrication. We saw the term “mashup” become prevalent during the past year or so, where companies sought to integrate similar services (or even disparate ones) in a new service delivered via the Web. A could of quick examples of this is evident with the numerous Twitter services that use Twitter data and either present this data in different ways or full integration into other services, or the advent of Yahoo!’s Pipes.

Key to Integration is making the connections easier through the use of public APIs. As more companies expose their API’s to developers, the wheels for integration become even more greased. This is all fine and good provided that these API are carefully documented, but more critical is that APIs must adhere to some sort of standard. Unfortunately, the “standards” requirement is a lot harder to require and maintain. At a recent Cloud Computing Interoperability meeting that I participated in, most attendees agreed that Standards are a huge priority, however, defining these standards would be a daunting task to undertake. But this interop was a clear step forward by the leaders in the industry towards defining these standards. If you step back a few years, you could view Web Services as a precursor to the API movement we see now (API’s are a subset of Web Services), and XML standards helped to propel the acceptance of Web Services and Integrations in general.

I feel that those companies who are currently working to aggregate (or integrate) various API’s into their business model are well positioned to be the ones who can help drive these standards. Case in point, GoGrid has a public API and recently signed up various Cloud Aggregators (such as RightScale, Appistry and GigaSpaces). These companies use a variety of other Cloud Infrastructure providers within their management services. The more that I thought about it, the more I realized how important these Cloud Aggregators’ roles are in driving some Cloud standards. They have views into all of their partner API’s and can easily find similarities and differences between these API’s. Any API’s that these aggregators come up with themselves are one step closer to a standards-based API that could potentially be generic enough for use by many if not all providers.

What is also interesting, is that this concept of Integration and Standards actually does apply to our current World Financial Crisis as well. We have a bank and financial institution pandemonium with mergers seemingly occurring daily. These institutions will need to integrate diverse systems in order to succeed and the government will be forced to derive some standards to govern their vested interest in these institutions. Sure, this is a fairly broad application of these terms in this comparison between Web 3.0 and Finance, but the ideas are similar.

But back to the Gartner predictions for 2009. First, we need to take off our rose colored glasses here. Any time you make a prediction, the odds are that you could be wrong in the long run. I realize that this is a bit pessimistic, but just look at our Economy right now. There were plenty of naysayers who told us that we were going down the wrong path, but we still proceeded ahead, ignoring these predictions. Technology trends are no different than Economic ones; you can make an attempt to predict based on the past however, the difference here is that technology seem to be a lit less volatile compared to the economy.

So, let’s take a look at Gartner’s 2008 and 2009 Strategic Technologies list:

2008 Strategic Technologies 2009 Strategic Technologies
1. Green IT 1. Virtualization (#5 previously)
2. Unified Communications 2. Cloud Computing (new)
3. Business Process Management 3. Servers – Beyond Blades (8)
4. Metadata Management 4. Web-oriented architectures (new)
5. Virtualization 5. Enterprise mashups (6)
6. Mashups 6. Specialized Systems (new)
7. The Web Platform 7. Social Software & Social Networking (10)
8. Computing fabric 8. Unified Communications (2)
9. Real World Web 9. Business Intelligence (new)
10. Social Software 10. Green IT (1)

I’d like to dive into these lists, not all topics but just the ones that caught my attention. Interestingly, I find that several of the items on these lists seem to have blurred boundaries while others clearly stand alone.

Green IT, Virtualization, Cloud Computing, Computing Fabrics/Servers – Beyond Blades, and the Web Platform/Web-oriented Architecture, in my mind, are Technologies where this “blurring” is clearly evident. Cloud Computing obviously is the buzzword of 2008 as well it should be. One can actually lump the others in this short-list under “the Cloud.” Fortunately (or unfortunately), this all-encompassing term is used in every technology conversation nowadays. The problem is, because it is being used as such a generic term, many people are having trouble really understanding what “the Cloud” truly is. Some points:

  • The Cloud is definitely “Green” in that there are obvious power and energy savings compared to traditional rack & stack servers.
  • Green works hand-in-hand with Virtualization. While power and energy efficiencies can be gained through hardware optimizations (e.g., green chips, reduction of power-hungry servers), these efficiencies can be more dramatically realized through virtualization of hardware appliances and components.
  • Similarly, Cloud Computing employs the use of Computer Fabrics; instead of partial resource utilization of a bare-metal server, with Cloud Computing one can target just CPU or memory aspects (infrastructure resources and components) and gain efficiencies through their isolated uses.
  • Finally, if you plug in the Web as a Platform or Architecture provider and delivery mechanism, one can clearly see how Computing resources can be delivered via said architecture as opposed to traditional methods (e.g., architect in and deliver via the Cloud vs. bare metal and more static and rigid infrastructures).

Back to my earlier point of Integration being a key driver of Web 3.0, Gartner lists (Enterprise) Mashups as another Strategic Technology to watch. I heartily echo this. It will, undoubtedly, take the Enterprise much longer to realize this from a concept point of view as well as the actualization of this technology, but we do know that integration is critical. Why not leverage experts from various practices and bolster your own services or products through integration with these experts. Mashups is a Web 2.0 buzzword that I would recommend be dropped for a more encompassing term of “Integration.” Mashup has the connotation of being very Web-centric (e.g., only visible on the web). Integration, on the other hand, can be applied to both Web-centric delivery but also to more behind-the-scenes channels of Web Services and specifically, APIs. Integration using APIs will give companies clear competitive advantages versus those SMBs or Enterprises that opt to maintain closed systems. Integration of systems can also help drive BPM (Business Process Management) as well as BI (Business Intelligence). By overlaying dissimilar data sets, new conclusions can be made based on the analysis of the data intersections or relationships, thus presenting more distinct and unique offerings.

Lastly and perhaps the ugly duckling of the group, Social Software and Social Networking, I believe will be core to 2009. During any economic crisis or recession, Companies immediately look to slash Marketing and PR budgets above all other Departments. Prior to Web 2.0, Marketing and PR was all about blasting your product or service messages out to the masses. Web 2.0 introduced the idea of engaging in conversations with groups of users and understanding the needs of those users. More recently, with the huge adoption of Social Networking by all types of users (business and personal), the message became even more targeted, reaching almost a 1:1 conversation. This has evolved into Social Marketing using Social Networking/Software as the delivery mechanism. While more difficult to do well and somewhat hit-and-miss at times, Social Marketing is potentially more efficient than dropping gobs of money on keyword buys, sponsorships, or events. Enterprises are already moving towards engaging their prospects or customer base through community-based outreach and social networking channels. Doing it right, however, is a completely different beast. It’s good to see that Gartner views this as a critical technology component of 2009.

We still have to maintain a clear perspective in all of this though. If the Global Recession truly hits as it seems that it will, the items on the list that directly and positively impact the bottom line of companies will naturally rise to the top. Maintaining a cost-effective, competitive advantage in the future will be much more difficult to achieve. I dare say that adopting Cloud Computing as a primary technology strategy will be one of the main catalysts for technology-savvy business to not only stay in business, but also be successful in the long run.


I spent some time analyzing search trends of different computing keywords to try to put everything in perspective. Google trends is a nice too that gives insight into broad search patterns.

Google_Trends_logo_sm

We all know that the term “Cloud Computing” is relatively new to the Technology buzz. But just how new is it? For starters, I ran a quick comparison of “Cloud Computing,” “Grid Computing” and “Utility Computing”.

trends_cloud_grid_utility

The term Grid Computing has been around for a while (even before Google Trends tracking shows it). But as you can see from the graphic above, it is trending downwards. Utility Computing has pretty much remained below the radar in comparison. But, the newcomer Cloud Computing, which made its full entrance into this trend analysis around 2007 is rapidly gaining momentum. 2008 seems to be a pivotal time where it surpassed Grid Computing (and continues to grow).

Cloud computing is relatively new as a server hosting term. People are starting to loosely associate it with traditional hosted server solutions. So to put this all in perspective as well as add some other “hot” keywords in to the mix, I trended the following:

  • Cloud computing
  • Grid computing
  • Dedicated server
  • Colocation
  • Virtualization

The results were quite interesting:

trend_cloud-grid-dedicated-virtualization

My read on this is as follows:

  • Cloud Computing and Virtualization are the next hot hosting platforms. It is important to keep in mind that the term “Virtualization” can apply to many things, not simply hosting, in fact, Virtualization within the hosting environment is comparable to Cloud Computing. Virtualization has existed for some time, but mainly within a host’s computer (e.g., a desktop). But as Parallels, VMWare, Xen and even Microsoft’s Hyper-V gain momentum as virtualized servers within a hosted environment, this term will continue to grow. See the chart below for further details (VMware is the clear leader but Hyper-V is clearly going to gain market-share quickly).
    trends_parallels-vmware-xen-hyperv
  • The Dedicated server term is slowly starting to lose ground vs. Virtualization and Cloud Computing, but it is fairly obvious that it is still a term that people know and look for. There are always developers or companies who will ONLY go with a dedicated server for one reason or another. I predict, though, that as they start getting on the virtualization and cloud bandwagons, that this term will continue to erode. Another term “VPS” (Virtual Private Server) is fairly common among hosting solutions but differs from Virtualization in many ways. With a VPS, you share resources with the other clients on a particular server, whereas Virtualized servers (like GoGrid which is built on top of Xen) dedicate RAM and CPU usage to the predefined server instances running on a particular node. To again put it all into perspective, see the chart below. VPS is one of the terms that seems to be remaining steady as a searched term. This is most likely due to the fact that most of the main-stream hosting providers offer VPS hosting as their “bread & butter.”
    trends_cloud-grid-dedicated-vps-virtualization

In general, these terms all seem to be converging, which means only one thing, confusion and clutter within the marketplace. With so many options now available, potential server customers are presented with even more choices, and these choices frequently can’t be directly compared. One can look at RAM allocation, Hard Drive sizes and CPU speeds as sort of a rudimentary measure, but that is where the simple comparisons end. Now one is forced to choose between scalability options, server and data persistence, operating system images, peripherals (like firewalling and load-balancing), data storage, clone-ability…the list goes on. Attempts are being made to standardize these comparisons with check-lists, but since the market is so new and mutating with new entrants and updated feature sets, the IT Professional may be challenged when making decisions.

Lastly to put things all in perspective a bit, I ran a couple of other search terms, comparing “Twitter”  against Cloud, Grid and Utility Computing…the results aren’t surprising (the green line is Twitter):

trends_cloud-grid-utility-twitter

And put the iPhone into the mix and everything drops off the map (note, this graph is just for Cloud, Grid, Twitter and iPhone - iPhone is the green line below):

trends_cloud-grid-twitter-iphone

Also, the Cloud just got another potential injection of PR from Apple as well with their announcement of MobileMe. To take directly from the source:

“MobileMe stores all your email, contacts, and calendars on a secure online server — or “cloud” — and pushes them down to your iPhone, iPod touch, Mac, and PC. When you make a change on one device, the cloud updates the others.”

Apple has brought a new technology term, the “Cloud”, previously reserved for developers, IT managers and the like to the main-stream public. Watch the cloud continue to grow now almost exponentially, I predict, even down to common-place iconography:

mobileme

So, how can you “keep your head out of the clouds” with all of this clutter? I can offer the following points to help:

  • Look beyond the hardware – it’s becoming virtualized and virtually upgraded constantly; some companies will tout just one piece of the mix, look at Support, the company’s history, their Terms of Service or Service Level Agreements as other non-tangible measures
  • Don’t just jump on the bandwagon – a solution for one company or competitor may not be the solution for you; shop carefully
  • Get involved with the community – the fact that you are reading this article means that you are doing the right thing in doing your research first. Read blogs and forums as well as attend meetings to talk to end users
  • Don’t over-extend your resources – IT budgets are tight so make your decision based on that. Dedicated servers are frequently premium monthly payments; virtualized hosting can even be price by usage
  • Follow the K.I.S.S. rule – keep things simple; over-engineered network topologies can actually hurt your presence.

Where does GoGrid come into play? For starters, it offers “control in the cloud” by crystallizing real, on-demand servers into an experience that is simple, scaleable and powerful. If you want to visit Cloud Computing in a way that is both understandable and attainable, look no further than GoGrid.


Computing on "Cloud Nine"

Written by on Mar 18th, 2008 | Filed under: General, GoGrid
10,983 views

353558249_5b33a0281d_oEveryone seems to be either talking about cloud computing, launching their product “within the cloud” or developing a “cloud” infrastructure. I would like to take a step back and really think about why the word “cloud” is being used in the first place.

First, a quick side note: as I tried to track down the origins of the term “cloud computing” I did come across a very insightful post by Paul Wallis that does a fantastic job stepping through the evolution from “supercomputing” through “the cluster” into “the grid” and eventually up into the “clouds.” The concept of having “data clouds speaking to supercomputer clouds” is becoming a reality, according to Wallis, however, I echo his concern that in order for this magical marriage to take place, there needs to be a new level of Quality of Service connecting the two, among other things.

Even with the foundation being laid by some heavy players, cloud computing is still in its infancy. But this is not the subject of this article. I still circle back to the marketing “genus” that coined the term “cloud” to describe this new computing paradigm. For that, I move away from the technical and more to the linguistic.

The term “cloud” can be used in many forms of speech:

  • Noun – The clouds of smoke filled the room
  • Verb – The smoke clouded the room
  • Adjective – The cloudy smoke filled the room
  • Adverb – The smoke cloudily filled the room

So, cloud is a good word choice from a grammatical perspective since it can be used with a variety of ways. But is it a good term to use to describe a product or technology? I’m not so sure. As an exercise, I started writing down words that came to mind when I thought about “cloud”. In no particular order:

Intangible Blown by the wind
Bad weather Dark
Gloomy Obscure
Vapor Nebulous
Not solid Evaporate
Storm Seeding
Rain Up in the sky
Fragile Impossible to measure
Weightless Ethereal
Ephemeral Gray
Unclear Airy

05-8-17-3970

Any patterns here? From my read, most of the terms seem to have negative connotations. I get visions of letting a balloon loose into the air and watching it disappear into the clouds. (Bursting bubble anyone?) To take things a bit further:

  • Companies have used terms like “vaporware” to describe software or code in advance of its release which then fails to materialize.
  • “Pie in the sky” is a phrase used to describe a promise heaven but continuing to suffer on earth.
  • To “have your head in the clouds” comes from the Latin proverb “Caput inter nubila condit,” a line from Virgil’s Aneid which, loosely defined, means to have unrealistic, impractical ideas.
  • Fragile, weightless, intangible, nebulous, unclear, impossible to measure – all these connote something that is vacuous and non-solid.

So I ask you this, does this make you comfortable 07-4-23-1392putting your mission-critical data or applications within a cloud? Earlier terms like cluster, super, utility and grid computing, in my mind, make much more “tangible” sense. While I’m sure this term is here to stay and there is not much that I can do to change that, I do question the terms legitimacy within technology and the development of solid business practices. Would you rather work in the cloud or work on a server? Even though the term “virtualization” tends to imply something that is not real, it is closer to the ground and significantly more absolute than something “in the clouds.”

Computing in the cloud, or dare I say, on “cloud nine”…I’m just waiting for reality to hit and the rain to begin.

[Cloud images used by permission.]


Squashing Virtualization Bugs – The Dogbert Way

Written by on Feb 15th, 2008 | Filed under: General, GoGrid
5,310 views

The Dilbert cartoon continues its virtualization theme and the topic is a new “creative” way to ensure that you don’t have any bugs!

dilbert2008073345215_sm

Obviously the GoGrid team doesn’t subscribe to this methodology, only to the cartoon.