2010-02-23

"How do I add more memcached capacity without an outage?"


Once someone starts using memcached, they tend to quickly find themselves in the state of: "my database servers overload and my site goes down if the memcached stops working". This isn't really surprising, quite often memcached was thrown into the stack because the database servers are melting under the load of the growing site.
But then they face an issue that is, as mathematicians and programmers like to call it, "interesting".
"How do I add more capacity without an outage?"
At first most people just live with having that outage. Most systems have regularly scheduled downtimes, and during that them the memcached clusters can be shut down, more storage nodes are added, and then it is all restarted, with the new distributed hash values for the new number of nodes.
Ironically, the more successful the site is, the more it grows, the more costly that outage becomes. And not linear to that growth either. The increasing cost is more on the order of the square of the growth, until they literally cannot afford it at all. As the cache gets bigger, it takes longer for it to rewarm, from minutes to hours to days. And as your userbase grows, the more people there are to suffer the poor experience of the cache warming up. The product of those two values is the "cost" of the outage. This is bad for user satisfaction, and thus is bad for retention, conversion, and thus revenue.
This can be especially frustrating in a cloud environment. In a physical datacenter, because you have to actually buy, configure, and install the hardware for a node, it somehow feels easier to justify needing an outage to add it to the cluster. But in a cloud, you can start a new node with the click of a button, without getting a purchase approval and filing a change plan. And also, in cloud environments, we all have been evangelizing "dyanamic growth", and "loosely coupled components", and "design for 100% uptime in the face of change". And yet here is this vary basic component, the HDT KVS cluster, that doesn't want to easily work that way.
There are ways to resize a DHT cluster while it is live, but doing so is an intricate and brittle operation, and requires the co-operation of all the clients, and there are no really good useful open source tools that make it easier. You have to develop your own custom bespoke operational processes and tools to do it, and they are likely to miss various surprising edge cases that get learned only by painful experience and very careful analysis. Which means that the first couple of times you try to resize your memcached cluster without a scheduled outage, you will probably have an unscheduled outage instead. Ouch.
One commonly proposed variety of solution is to make the memcached cluster nodes themselves more aware of each other and of the distributed hash table. Then you can add a new node (or remove a failed one), and the other nodes will tell each other about the change, and they all work together to recompute the new DHT, flow items back and forth to each other, put items into the new node, and try to keep this all more or less transparent to the memcached clients with some handwaving magic of proxying for each other.
And that, more or less, is what Gear6 has just done, under the name "Dynamic Services". We have released it first in our Cloud Cache distribution, initially on Amazon AWS EC2, and then on other cloud infrastructure systems. Soon next it will be in our software and appliance distributions.
This is an especially useful and neat in a cloud environment because the very act of requisitioning and starting a new node is something that the underlying infrastructure can provide. So you can go to the Gear6 Cloud Cache Web UI, and ask it to expand the the memcached cluster. That management system will interface to the EC2 API, and spin up more Gear6 memcached AMIs, and once they are running, add them to the cluster and then rehash the DHT. All while the cluster is serving live data.


(This entry was originally posted at my Gear6 Corporate Blog. Please comment there.)