Since the launch of GoGrid over 2 years ago, we have provided free F5 load balancing as part of our Cloud Infrastructure offering. Being able to provide load-balanced solutions in the cloud is critical to having a scalable environment and without F5′s load balancers, our service offering would be very different.
But understanding the nitty-gritty details of how the F5′s work within GoGrid is not something that most people understand. For the most part, customers are concerned that it simply works and works well, which it does. Yesterday, Lori MacVittie, the Technical Marketing Manager at F5, posted a technical article that does a great job explaining how the magic behind the scenes works with the F5 load balancers and GoGrid.
The article is available on F5′s DevCentral as well as below:
If you don’t know how scaling services work in a cloud environment you may not like the results
One of the benefits of cloud computing, and in particular IaaS (Infrastructure as a Service) is that the infrastructure is, well, a service. It’s abstracted, and that means you don’t need to know a lot about the nitty-gritty details of how it works. Right?
Well, mostly right.
While there’s no reason you should need to know how to specifically configure, say, an F5 BIG-IP load balancing solution when deploying an application with GoGrid, you probably should understand the implications of using the provider’s API to scale using that load balancing solution. If you don’t you may run into a “gotcha” that either leaves you scratching your head or reaching for your credit card. And don’t think you can sit back and be worry free, oh Amazon Web Services customer, because these “gotchas” aren’t peculiar to GoGrid. Turns out AWS ELB comes with its own set of oddities and, ultimately, may lead many to come to the same conclusion cloud proponents have come to: cloud is really meant to scale stateless applications.
Many of the “problems” developers are running into could be avoided by a combination of more control over the load balancing environment and a basic foundation in load balancing. Not just how load balancing works, most understand that already, but how load balancers work. The problems that are beginning to show themselves aren’t because of how traffic is distributed across application instances or even understanding of persistence (you call it affinity or sticky sessions) but in the way that load balancers are configured and interact with the nodes (servers) that make up the pools of resources (application instances) it is managing.
LOAD BALANCING FU – LESSON #1
Our first lesson revolves around nodes and the way in which load balancers interact with them. This one impacts the way in which you orchestrate processes, automate tasks, and interact with cloud framework APIs that in turn interact with load balancing solutions. It also has the potential to influence your choice of provider or core application design decisions, such as how state is handled and persisted.
In general, a load balancing virtual server interacts with a pool (farm) of nodes (applications). Because of the way in which networks are architected today, i.e. they’re IP-based for interoperability, a node is identified by an IP address. When the virtual server receives a request it chooses the appropriate pool comprising one or more nodes based on the associated algorithm. Cloud computing providers today appear to be offering primarily one of two industry standard algorithms: round robin and least connections. There are many more algorithms, standard and not, but these two seem to be the most popular.
When a load balancing solution detects a problem with a node (it’s monitoring the node by sending ICMP pings or opening a TCP connection or doing an HTTP GET, depending on the provider and implementation) it marks it as “down”. Nodes can also be marked as “down” purposefully, in the event you want to perform some maintenance or find a problem with the application running and need to address it.
The gotcha comes in here, because when you’re scaling down or purposefully taking down a “node”, there may be users that have been directed to that node and are still actively interacting with it; there’s still connections to the node. While load balancers will stop directing new requests to a “down” node, how they handle existing (open) connections is configurable. When a node is marked as “down” the Load balancer has some options how to interact with existing connections ranging from allowing existing connections (users) to complete their interactions with the application to outright rejection of all incoming requests. The decision on this configuration option by the provider impacts the way in which you should architect your application if you’re really going to be taking advantage of elastic scalability, because while the options do not impact scaling up they do impact scaling down.
For example, after Amazon introduced its ELB (Elastic Load Balancing) service, users began to see what they considered odd and unacceptable behavior that was merely the result of the way in which load balancing works.
We have a production site that is now sending traffic to someone else’s site. We terminated a couple of instances earlier today and this may have started this phenomenon, but this is very, very disconcerting. Repetitive refreshes brings up this site just as regularly as our servers. Can someone from amazon please help us out here? This is totally unacceptable behavior.
and a little while later in the thread:
I’m pretty surprised that terminating an instance doesn’t remove it from a load balancer.
The behavior (which Amazon indicates in the same thread has been resolved) is based on the understanding that taking a node “down” is not the same as “deleting” it. By terminating an instance in a load balanced environment the load balancer sees that it is “down” and marks it as such. Taking down the instance does not automatically delete it because there are many reasons you might want to take a node “down” and bring it back “up” later, such as upgrades, patches, etc… If you don’t bring it back up, however, the load balancer needs to be explicitly directed to delete the node lest you end up with odd behavior as described above (which shouldn’t happen in a truly isolated network environment, which says something about the way in which Amazon was internally designed from a network layer perspective). The ability of a load balancer to detect nodes as “down” and later as “up” is by design; it is in part what makes them dynamic and allows transparent (non-disruptive) scalability. Nodes can be added, deleted, taken down, and brought up without impacting the clients and their interaction with the application. The steps to scale down, then, require that first you take down the node (deregister in Amazon speak) and when all connections have been terminated (through normal means) then you delete it from the load balancer. This takes some coordination.
GoGrid’s load balancing implementation will not cause the scenario experienced by Amazon customers for two reasons: first, it makes extensive use of VLANs to isolate customers and thus no load balancing service for customer #1 will be able to communicate with a server used by customer #2 and second, GoGrid has orchestrated scalability services in such a way that load balancers are added and deleted en masse. Individual nodes are never added or removed but rather the entire load balancer object is modified. This effectively means the deletion of a node (by reconfiguring the load balancing object) is immediate and all existing connections are terminated. This is similar to the load balancer being configured to “reselect” a node when the original node is “down”. This is something of which you need to be aware when deploying applications as it will eventually impact either the way your application behaves (from a user perspective) or your ability to scale elastically.
SCALING SERVICES IMPACT DEVELOPER DECISIONS
If you’re going to be using the scalability services of a cloud provider you, as a developer, need to know how the load balancing service is going to act when scaling down. You need to know specifically whether it bleeds (quiesces) connections when a node is taken “down” or whether a node being “down” results in rejects or reselection. This impacts (or should impact) the way in which you design your applications or the choice of provider. Also important to the decision is what you consider the proper behavior for your application. It may be the case that it’s acceptable for users to “lose” their session. It may not be. The core logical behavior and user expectations should in part drive your architectural decisions when designing a scalable application, even if that scalability is provided by a third-party as is the case with public cloud computing.
There are (at a minimum) two questions you should ask a provider regarding its scalability services to help determine the apropos course of action.
- Rewrite the application to share state (shared-session database architecture)
- Rewrite the application to be stateless
- Do nothing and accept rejection/user loss of session as an acceptable risk
- Choose a provider whose load balancing behavior aligns with your expectations
- Implement a virtual private cloud instead, taking advantage of existing infrastructure to load balance applications “in the cloud” by extending your data center to include cloud-based resources.
In all cases, when you’re scaling down you don’t really want to just terminate the instance; you want to mark the node as down and allow the connections to bleed off before terminating it and deleting it from the load balancing solution. That’s the way it should work. By simply terminating the instance the node is marked as down and may continue to incur charges because health checks are using bandwidth to determine the status of the node. In the optimal case the load balancing service would offer a “conditional” delete function in its API that marked the node as “down” and, when all connections were completed, would automatically remove the node from the load balancing service. This is not the case today, for many reasons, but is where we hope to see such services go in the future.
Latest posts by Michael Sheehan (see all)
- Get Your Game On in the Cloud - June 11, 2013
- How Software Defined Networking Delivers Next-Generation Success - June 5, 2013
- James Gosling to Speak on Innovation at GoGrid Cloud Meetup on 5/22 - May 16, 2013