Creating an auto-scaling web application is an ideal use of cloud computing. Although manually scaling your infrastructure is easy in the GoGrid cloud, programmatically controlling your infrastructure to scale automatically is an even better example of the power of the cloud. This scenario–an application that can increase and decrease its server count, and therefore capacity, based on the load it’s experiencing at any given time–makes IT professionals, sysadmins, and application developers alike extremely happy. And it’s also something you can build using out-of-the-box tools in GoGrid.
We’ve divided this topic into two articles:
Part 1 (this article) – The Theory of Auto-Scaling:
- Background: traditional vs. cloud hosting
- Programmatically architecting a solution
- The underlying Orchestration methodology
Part 2 – A Proof of Concept of Auto-Scaling:
- Do-it-yourself Orchestration
- Proof-of-concept examples
You’ll need some familiarity with GoGrid objects, GoGrid’s API, and PHP to implement this design. If you need a GoGrid account, please contact one of our Cloud Specialists.
Note: The processes and procedures described in this article are not covered by GoGrid Support because this is a proof of concept that auto-scaling applications can be built within GoGrid. This primer outlines the underlying concepts and methodologies for building an auto-scaling application. Any example code provided is not an officially supported product or service, and you should fully test your configuration prior to rolling it out into a production environment. Although all of our infrastructure as a service is covered under GoGrid’s Service Level Agreement, any code examples or any custom applications or functionality that you develop from this primer are not.
The Background: Traditional vs. Cloud Hosting
Cloud hosting creates a new application hosting paradigm. The very infrastructure that an application runs on can now scale with the demands being placed on it.
With traditional hosting, the application owner had to engineer for peak load and bake in some excess capacity to handle spikes and provide fault-tolerance (n+1). The server count is fixed and targeted at handling that theoretical maximum load. The problem is that very few applications operate at maximum load all the time. Application load tends to be cyclical, with both high and low load intervals. The periods of high load may occur during business hours or on evenings and weekends. Alternatively, peak load may occur seasonally, monthly, annually, or be tied to a promotion or special event. In the traditional model, servers and infrastructure are deployed targeting peak load. Excess capacity is wasted whenever load falls below peak level.
The example pictured above assumes that the infrastructure is powered by traditional infrastructure (physical servers) and that the peak load is estimated to consume 12 servers. Because these are physical servers, their quantity doesn’t change over time and consequently often sits underutilized compared to the load. This model is inefficient because there are large gaps where the demand on the application is far less than the bought-and-paid-for capacity, so resources are basically wasted.
In comparison, cloud hosting lets the application owner deploy infrastructure to match load at any given time. Moving to the cloud removes traditional concerns around data center management–hardware procurement, deployment cycles, and maintaining hardware tailored toward peak load–from the equation. The cloud hosting provider handles all those tasks for the infrastructure consumer, who can focus on building and managing an application.
The graph above shows how the infrastructure in a cloud environment can scale to adapt to demand, minimizing underutilization. Resources can be deployed as needed and decommissioned when they’re no longer needed for greater efficiency and better performance.
Programmatically Architecting a Solution
To create the solution to auto-scale your website, you’ll need to use several GoGrid infrastructure components and services:
- Cloud Servers
- GoGrid’s new Dynamic Load Balancer
- GoGrid Server Images (GSIs)
- GoGrid’s API
To set up an auto-scaling web environment, you’ll need to be familiar with how the following are configured, used, and deployed:
- Multiple inexpensive web front ends – I used 512 MB CentOS Apache web servers for the proof of concept that I will detail in Part 2 of this article. The idea is that you can use lower-end, inexpensive web servers behind a load balancer and adjust the count based on application load.
- Back-end servers as necessary for the application (such as a database server). This database could reside on a dedicated server or on a cloud server, but it provides a central store for application data, utilized by the front-end server.
- An Orchestration server that monitors the application load and deploys or decommissions front-end servers as necessary.
See our article on How to Create GoGrid Cloud Servers.
Dynamic Load Balancer
- A virtual IP (VIP) with a Listener on port 80 that balances traffic across a pool of Real IPs for each of the servers configured in the application. The VIP’s Real IPs will be disabled for un-provisioned servers. The Health Checker is pointed at the application’s main page.
See our article on How to Create a Dynamic Load Balancer.
GoGrid Server Images (GSIs)
- A fully configured web server you’ve saved as a GSI and can use every time the application deploys a new web server.
- Because the application and all relevant settings are preconfigured, the server built from this GSI can begin accepting connections as soon as it’s deployed, usually less than 5 minutes after it was requested by the Orchestration server. Whenever the application is updated, a new version of the GSI is saved and the Orchestration code is updated to use the new GSI.
See our article on How to Create a GoGrid Server Image.
- Code to interact with GoGrid’s API that controls cloud assets such as Cloud Servers and Dynamic Load Balancers. In this example, you can use the API to create and delete servers and to update the load balancer pool by adding or deleting a server.
The Underlying Orchestration Methodology
The central concept of Orchestrating an auto-scaling application is thresholding: When application load/state reaches a certain predefined metric, an Orchestration event occurs–a server gets deployed to handle increasing load or a server is decommissioned in response to application load falling off. It’s easy to use something built right in to the web server your application uses to collect website load metrics, such as Apache’s mod_status. This plug-in for the Apache web server can collect a variety of status data points from access count, requests/bytes per second, or status of the Apache worker threads.
Determine appropriate counters to monitor and threshold points by watching your application. The web servers will bog down under load at fairly predictable points. As a rule of thumb, you’ll want to deploy new servers when the load–averaged out per server–is around 65% of capacity. Basically, you’re smoke-testing your servers: See at what point under load the application starts to throw errors and stops performing. Take whatever metric you were watching and cut it by a third, and that is your Orchestration threshold for adding a server to your infrastructure. Follow the same procedure for decommissioning: Determine a point where application load warrants fewer running servers, and that is the decommissioning threshold. One important note about threshold tuning: Make sure your thresholds are far enough apart that decommissioning a server doesn’t put your application load right back at the point where it is about to trigger the deployment threshold!
A fault-tolerant application should maintain a minimum number of servers independent of application load. We recommend you use 2 or more servers for fault-tolerance. If 2 servers are running under a load balancer, the application will continue to serve requests if either one goes down–one of the key advantages of load balancing. If costs are a consideration, think about sizing your servers smaller and deploying more of them under the load balancer to gain the same level of performance.
Now that you have an overview of the theory of auto-scaling and why it’s important to leverage the capabilities of cloud computing to ensure high-performance and high-availability web applications, I want to explore how I created an auto-scaling proof of concept on GoGrid. The next article covers one way to implement an Orchestration application that performs the functions of deploying and decommissioning cloud servers automatically based on the application system load.
Latest posts by Scott Pankonin (see all)
- How To Create an Auto-Scaling Web Application on GoGrid (Part 1 – Theory) - April 23, 2013
- You Don’t Need a Superstorm: Disaster Recovery Basics - November 12, 2012