When I think about the phrase “auto-scaling,” for some reason it conjures up the word “Transformers.” For those not familiar with the Transformers genre of cartoons, toys, games, and movies, it is essentially about cars that turn into robots or vise versa, depending on how you look at it. When they need to fight or confront a challenge, Transformers will scale up from a vehicle (a car, truck, airplane, etc.) into a much larger robot. Then, when the challenge subsides, they scale back down to a vehicle.
Scaling – in terms of infrastructure – is a similar concept, but applied to the horizontal or vertical scaling of servers. Horizontal scaling means adding (or removing) servers within an infrastructure environment. Vertical scaling involves adding resources to an existing server (like RAM).
Let’s look at an example. An author of a content creation website may write an article that attracts the attention of the social media community. What starts as a few views of the article per minute, once shared by many in social media, may result in hundreds or thousands of requests for this article per minute. When this spike in demand occurs, the load to the server or servers handling the website’s content may experience extreme load, affecting its ability to respond in a timely manner. The results can vary from long page loads to the server actually crashing under the additional peak load. In the past, this scenario used to be known as the “Digg effect” or “Slashdot effect.”
Although this type of success is great publicity for the author, it’s bad for the brand hosting the content. And, if users encounter slow or inaccessible websites, they’re less likely to return for other content at a later point, which can eventually result in a loss of revenue.