By now, many in the Cloud Computing space have heard about (or even read) the University of California Electrical Engineering & Computer Science’s (EECS) study on Cloud Computing titled: “Above the Clouds: A Berkeley View of Cloud Computing.” Published on February 10th, 2009, the EECS’s paper provides a seemingly academic study of the Cloud Computing movement, attempts to explain what Cloud Computing is all about, and identifies potential opportunities as well as challenges present within the market.
The 20+ page study is authored by Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica and Matei Zaharia who all work in RAD Lab. (Interestingly, several of the companies mentioned within the study are also Founding Sponsors and/or affiliate members: Sun, Google, Microsoft, Amazon Web Services, etc.).
There has already been plenty of discussion and analysis of this study (by James Urquhart, Krishna Sankar and has even appeared on Slashdot.org). Needless to say, I felt compelled to get my two cents in, especially from the perspective of a Cloud Computing Infrastructure vendor.
From an academic standpoint, this document definitely has some legs. It is complete with carefully thought out scenarios, examples and even formulae, as well as graphs and tables. Some of the points that are brought up even got me scratching my head (e.g., using flash memory to help by “adding another relatively fast layer to the classic memory hierarchy”). Even the case analysis of a DDoS attack from a cost perspective of those initiating an attack to those warding off an attack on a Cloud was interesting to ponder. I commend these group of authors on undertaking such a grand task of not only writing by committee but also overlaying a very business school vs. mathematics and computer sciences approach to the writing and analysis.
Unfortunately, however, as I read through the document, I started scrawling madly in the margins with commentary that is somewhat contrary to what was written within the study.
A Few Comments from the “Peanut Gallery”
I don’t want my article to come off as a complete rebuttal to what is written in this study. Quite the contrary. I’m encouraged that one group within the academic community has taken considerable time and effort analyzing and writing about the Cloud. What appears below is a small “laundry list” of things that need to be called out and is a mixture of positive and negative comments:
- EECS’s Cloud Computing definition – “Cloud Computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. The services themselves have long been referred to as Software as a Service (SaaS), so we use that term. The datacenter hardware and software is what we will call a Cloud.”
My comments: I personally found this definition to be incomplete and potentially misleading. While the EECS is correct in including SaaS (Cloud Applications) as a subset of Cloud Computing, they have (consciously?) lumped everything else into a catch-all phrase of “hardware and system software.” For people to truly understand Cloud Computing, I feel that it is important to become much more granular in defining the layers of the Cloud (Cloud Applications, Cloud Platforms and Cloud Infrastructure – the “Cloud Pyramid”, a term I coined last year). I actually found it interesting that the group of authors couldn’t agree what the precise differences between the “X as a Service” were. In order for all of the assumptions and conclusions to take place, I would have thought that clearly defining what the “Cloud” is would be paramount to the success of the findings.
- 3 Important Technical Aspects of the Cloud – the group outlines three items of the Cloud: 1) “infinite computing resources” 2) “elimination of an up-front commitment” and 3) “pay for use of computing resources on a short-term basis as needed.”
My comments: For the most part, I agree with these statements. However, #3 is a bit skewed towards an Amazon EC2 model. At GoGrid, we are pioneering the idea of a “cloudcenter” (a datacenter in the Cloud) which presents a different paradigm. EC2 has long been touted as being a way for quick batch processing where instances are spun up, consumed and then discarded. This falls within the third aspect that is defined above. However, when you take the view of creating a “datacenter in the cloud,” there is less of a “quick use function” and more of a scalable infrastructure notion designed to replace traditional datacenters and associate infrastructures.
- New Application Opportunities – several new or emerging opportunities designed to capitalize on the benefits of the Cloud are outlined: “mobile interactive applications,” “parallel batch processing,” “the rise of analytics, extension of compute-intensive desktop applications,” and “‘earthbound’ applications.”
My comments: I’m actually glad to see these so carefully explained as they do cover many aspects that are potentially “unique” to the Cloud: dynamic storage, dynamic availability, scalable processing and compute power, and cost-effectiveness to name a few.
- Classes of Utility Computing – Amazon’s EC2 is at one end of the spectrum and Google AppEngine and Force.com is at “the other extreme” with Microsoft Azure falling somewhere in the middle. Also, “virtualized resources” are broken up into 3 classes: Computation, Storage and Networking
My comments: For starters, since the group was unable to fully define the Cloud “spectrum,” it’s difficult to understand how they place EC2 at one end and having the spectrum “end” at Cloud Platforms (e.g., Force.com or AppEngine). The “full” spectrum must include SaaS as well as PaaS and IaaS in order to fully encompass the definition. Gmail and SalesForce exemplify SaaS and definitely should be contained within the Cloud mantra. Microsoft Azure, Force.com and Google AppEngine are truly Cloud Platform. Perhaps within the Platform layer, Azure and AppEngine are far between, they do, however, occupy the same Cloud space of “here is a development environment, you must work within it” (e.g., Python, .NET). Cloud Applications are simply “here is a web-based software application that is available for consumption and you have minimal flexibility in terms of controlling it.” Lastly, Cloud Infrastructure works as “enjoy full control over your infrastructure despite the fact that it is a bit more challenging to control.” For the most part, the 3 virtualized resources do fall within what is outlined. Storage can be expanded to include “Cloud Storage” (dynamic), “Persistent Storage” (traditional) and “Volatile or Temporary Storage” (typically associated with EC2 instances where storage disappears when the EC2 instance is destroyed or goes down).
I could probably nitpick through some other items, but I will leave that up to you.
Comments from a Cloud Vendor perspective
In Section 7 of the study, the EECS group presents “10 Obstacles and Opportunities for Cloud Computing” which definitely should be addressed. For this section, I’m putting on my “GoGrid Green” colored glasses and presenting points and counter-points to each of the 10 items outlined. Again, this is not intended to come off as a ping-pong match, but rather a commentary and opportunity for dialog. I encourage you to read this section prior to reviewing my responses. I have tried to briefly paraphrase each item (but that probably doesn’t do it justice).
- Availability of a Service – “will Utility Computing services have adequate availability”
My Response: The study outlines outages specific to the Cloud, citing S3, AppEngine and Gmail in particular. I have said this before, outages happen and they are not unique to the Cloud. Natural and human-caused disasters occur. Hurricanes and cable cuts can affect all sorts of infrastructure. As with a traditional datacenter, in-house or outsourced, traditional or in the Cloud, a disaster failover and redundancy strategy should be part of an IT department’s general strategy for success or just survival. One thing to consider is mirroring or creating redundancy on different types of infrastructures: if your primary is in the Cloud, have a dedicated failover; if your colo is on the East Coast, think about something on the West. Also look beyond simply the service and review the Support organization, the Service Level Agreement (SLA) and the provider’s expertise within the field. GoGrid, for example, has 24×7 Free support, the most robust SLA of any Cloud provider and over 9 years of hosting experience and expertise.
- Data Lock-in – “the API’s for Cloud Computing itself are still essentially proprietary”
My Response: Unfortunately it seems that GoGrid’s announcement back in January of this year where we discussed how our GoGrid cloudcenter API has been put under a Creative Commons Sharealike license was somehow overlooked when compiling facts for this study. Our idea behind this move is to start working standards from the ground up. GoGrid is also an active participant in many of the interoperability meetings around the country. Part of the reason why we released our API to the community at large is to demonstrate our commitment to open standards. We also have modeled the GoGrid cloudcenter extremely closely to a traditional datacenter where all of your hardware, protocols and connectivity is familiar. This helps lessen the “lock-in” scenario and avoids the use of proprietary API’s and other components. Also mentioned is “surge computing” which is another term for “cloud bursting” or “hybrid” clouds. Our Cloud Connect offering works exactly in this way, where users can opt to have high-end, large I/O databases, for example, reside within a traditional, managed hosting environment (through ServePath, our parent company). Cloud Connect allows for scalable and dynamic web front-ends, hosted in the GoGrid Cloud, to connect via a dedicate private network to higher-end servers in a managed hosting back-end.
- Data Confidentiality and Auditability – “current cloud offerings are essentially public (rather than private) networks, exposing the system to more attacks”
My Response: The statement above is rather alarmist in nature. I agree that many efforts should be made to ensure the resiliency and security of the Cloud, and these efforts are well underway at GoGrid as well as other Cloud providers. Again, however, this is not something completely unique to the Cloud. Any hosting provider or datacenter (or cloudcenter for that matter) must ensure that security and the integrity of the network and infrastructure is maintained at a high standard. GoGrid, for example, is SAS70 Type II audited and certified. The EECS’s statement, however, is not a completely honest assessment. Public vs. Private datacenters, dedicated hosting or clouds are very different. The concerns of publically hosted infrastructures are really no different whether in the cloud or in a datacenter; they will both be inherently a bit more vulnerable. However, I would say that companies whose business it is to solely do hosting will potentially have more robust security protection and attack prevention measures in place than a self-hosted or even private cloud would. In terms of HIPAA compliance or Sarbanes-Oxley, there are stringent requirements of data protection, privacy and isolation. While it may be difficult to pass accreditation for these types of compliances “in the cloud”, using a feature like Cloud Connect, for example, allows for compliance to take place on a dedicated, warehoused set of servers within a traditional datacenter, something much more palpable and acceptable.
- Data Transfer Bottlenecks – “applications continue to become more data-intensive”
My Response: It’s all about the data, I agree. The Cloud is an ideal environment for statistical analysis and number crunching. I personally know of one GoGrid user who would spin up multiple instances of GoGrid servers, upload a huge amount of data, run some analysis programs and then export the resulting summaries, all in a matter of hours and only costing a few dollars. The arguments presented by the EECS group are true; until we get the ability to transfer large amounts of data through very big pipes at a extremely lost cost, this could be a barrier for those customers who may be considering the Cloud as a data eating machine. However, when we at GoGrid designed our business model, we kept scenarios like this in mind and came up with an easy solution: make all inbound data transfers free. This way, GoGrid users can upload large amounts of data to their cloudcenter, move that data around within the private network therein, put some on Cloud Storage should they desire, analyze to their hearts content and then download the summary or result sets (typically much smaller in file size than the data going in). GoGrid does charge for outbound but you can see how the pricing model works to the user’s advantage in analysis scenarios.
- Performance Unpredictability – “multiple Virtual Machines can share CPUs and main memory surprisingly well in Cloud Computing, but that I/O sharing is more problematic”
My Response: This is a very good point and difficult to fully refute. It’s true that CPU and RAM can be virtualized, managed and isolated extremely well. Disk I/O performance can suffer at times. Again, this is part of the reason we offer a solution for this with Cloud Connect (see previous statements). It is frequently better to offload extremely intensive I/O processes to a dedicated environment, at least until virtualization technology gets more aligned with bare-metal performance. We even released a “custom patch” for 64-bit Linux users on GoGrid that helps increase disk drive performance. While some may says that this is a bit non-standard, it does show our understanding of this concern and marks an effort to resolve or minimize the impact.
- Scalable Storage – short-term usage, no up-front cost and infinite capacity on-demand doesn’t apply to persistent storage
My Response: I have to agree somewhat to this idea, however it is a bit of an oxymoron. Persistent storage requires that it is dedicated in some way, available at all times and easily usable. On EC2, for example, if your instance dies, you lose any persistence of data, which is part of the reason why they recommend using S3 (their Cloud Storage offering). This is logical from so many standpoints: redundancy & share-ability are two that immediately jump to mind. Again, at GoGrid we took a slightly different approach by making all GoGrid Cloud servers have persistent storage available from the beginning. The amount of persistent storage is directly tied to the amount of RAM you have allocated: if you choose a higher RAM instance, you get more persistent storage. However, I don’t see scalable storage to be an obstacle entirely. Amazon offers S3 and GoGrid has a similar Cloud Storage offering. Both are scalable on demand, billed by usage and usable by Cloud Servers. GoGrid’s Cloud Storage is mountable as a drive and shareable among a user’s GoGrid servers within the GoGrid infrastructure using industry standard protocols (e.g., SAMBA, CIFS, RSYNC & SCP). To that end, in my mind it does meet the 3 properties outlined with the omission of the “persistent” adjective.
- Bugs in Large-Scale Distributed Systems – “one of the difficult challenges in Cloud Computing is removing errors in these very large scale distributed systems”
My Response: This is actually one obstacle that I fully agree with. Often it is difficult to “mirror” physical, large scale computing environments within the Cloud. Unfortunately, it is not an apples-to-apples comparison. One simply cannot just “port” a physical, complex infrastructure over to the Cloud. If you do, you will fail. You need to architect your Cloud environment capitalizing on the efficiencies and features of the Cloud. Otherwise, you simply translate (and potentially compound) issues existing previously further. Another thing to consider is that all Virtualization or Hypervisor technologies have bugs, as with any software for that matter. The complexity of a Cloud environment is multi-fold: at the hypervisor and management layer, the hardware layer of the grid or utility architecture, as well as within the VM’s themselves. This is a complicated and delicate environment. The good news is, because this is technology that is around to stay, and is consistently being built upon, refined and improved, the end results are only improvements. Important to this again is interoperability and standards, similar to the Wild West becoming civilized and engineered. Bugs will be squashed and efficiencies gained through increased R&D efforts as well as customer adoption and validation.
- Scaling Quickly – “automatically scale quickly up and down in response to load in order to save money, but without violating SLAs”
My Response: This is one of the key value propositions of Cloud Computing. You must be able to scale up and down based on demand (or even based on a budget). Much of this can be done using API’s or companies like RightScale. As I mentioned previously, Design for the Cloud. Traditionally, companies over-bought their infrastructure, saving it all for a rainy day. At ServePath, we know for a fact that CPU, RAM and Storage on our dedicated machines are only hitting about a 5% utilization on average. Many companies have built up their infrastructure for the “what if” scenarios. These inefficiencies are part of the reason why Cloud Computing has become so popular, a panacea of sorts. When you design for the cloud, you must ensure that your strategy capitalizes on scalability, both up and down, but also on redundancy and persistence. Of course, it all depends on the type of system you are architecting (persistent – a store-front or content driven marketplace, or temporary – data analysis, bulk processing).
- Reputation Fate Sharing – “reputations do not virtualize well”
- Software Licensing – “licensing models for commercial software is not a good match to Utility Computing” & “pay-as-you-go seems incompatible with the quarterly sales tracking”
My Response: Software licensing models are being forced to evolve to be able to handle the on-demand nature of the Cloud. While Amazon took the approach of increasing the hourly charge to handle licensing of Windows Server vs. an open-source alternative, GoGrid, in order to maintain simplicity, rolled it all into one (no difference between Red Hat, CentOS or Windows). Licensing of Microsoft SQL Server on GoGrid, for example, is handled through a monthly (not hourly) charge. This helps with both a customers budget projections as well as from our own sales projections. Simplicity in explanation and execution is critical. If your user is confused as to how the billing works or how to project what charges they will incur, they will not execute. Token billing, tied to hourly charges will also become increasingly prevalent.
Summing it all up
If you made it both through the EECS group’s study as well as this blog post, I truly commend you, and you hopefully have a better understanding of the Cloud Computing term and properties therein, especially from the standpoint of an academic institution and Cloud Computing vendor. While I have challenged a few of the statements made within the study, there are others that stand up just fine. The important overall idea here is that serious brainpower and resources are being thrown at the Cloud, from understanding and analysis standpoint to development and execution therein.
A special message to the EECS group: I would personally like to invite you all cross the Bay (from Berkeley to San Francisco) to come and visit a Cloud Computing provider who is already overcoming the obstacles you have outlined. We would love to have a round-table discussion about the Cloud and help you with the next version of this study.
- M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Feb 10, 2009. “Above the Clouds: A Berkeley View of Cloud Computing.” Electrical Engineering and Computer Sciences. University of California at Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html p. 4 [↩]
- ibid. p. 4 [↩]
- ibid. p. 4 [↩]
- ibid. pp. 7-8 [↩]
- ibid. pp. 8-9 [↩]
- ibid. pp. 14-15 [↩]
- ibid. p. 15 [↩]
- ibid. pp. 15-16 [↩]
- ibid. pp. 16-17 [↩]
- ibid. pp. 17-18 [↩]
- ibid. p. 18 [↩]
- ibid. p. 18 [↩]
- ibid. p. 18 [↩]
- ibid. p. 18 [↩]
- ibid. p. 19 [↩]
Latest posts by Michael Sheehan (see all)
- Get Your Game On in the Cloud - June 11, 2013
- How Software Defined Networking Delivers Next-Generation Success - June 5, 2013
- James Gosling to Speak on Innovation at GoGrid Cloud Meetup on 5/22 - May 16, 2013