Storage Replication for High Availability
My job entails running ~10 relatively busy sites. These sites are no Facebook, or Google; hell, these sites don't even crack the Alexa 100k. Maybe all 10 of the combined would, who knows. But I digress.
3 years ago, these sites were running, for the most part, their own copies of the same code. So a developer and I began working on a framework to allow these 10 sites to run together, on a common codebase. After about 10 months of development, we had a solid framework that we could use.
We had always planned on running our sites on multiple boxes. We had 8 servers, we might as well provision them and use them to the fullest. So, we bought a load balancer. We even put a couple of proxy servers out in front of the Apache boxes to handle static content. I set up master-master replication within MySQL. But replicating the data on the Apache boxes was problematic.
We tried setting up rsync to copy the files between our two Apache servers. But our employees want to see their changes in near-realtime, which means we'd need to run rsync every minute. But peoples schedules aren't always predictable, so we couldn't turn it off during the evenings, but there might not be any changes then. So rsync was out.
We tried setting up NFS. However, NFS introduces a single point of failure, as well as introducing (for us) massive network latency. So NFS was out.
My coworker at the time (different from the original developer above) had done some C/C++ "back in the day", and had written his own chat server way back when. So, he began implementing a tool in C that would listen for inotify events in a directory, and send them via a custom TCP/IP socket to the other servers. This would then trigger an inotify event for them, and send, ad infinitum.
The only problem with this solution is that none of the other developers (read: me, and the VP) know C, which makes this unsupportable by anyone in the company. This developer was then let go (not for any performance reasons), so we had to find a way to replace it.
I've been working with perl for about 10 years now, and I knew that perl could handle this. I started coding, loosely using the design information from the C program. I got a good way down the path, when I was pulled off to work on something else. The inotify syncing system would have to wait.
At this point, we've been running on a single server for almost a year.
Yesterday, someone on HN posted about DRBD. I read up on it, and it seemed like a perfect fit. Data is replicated automatically across machines using a block device. No futzing with inotify, no futzing with home-grown C (and perl) code. I got it set up and running on our secondary server fairly easily.
That's when I found the problem.
DRBD expects to be mounted on a block device itself. That means I can't just mount it to a directory, and expect it to work. While this is not a problem in and of itself, our primary Apache server only has 1 logical disk. So I can't use DRBD on our primary box as I don't have another block device. I had to find another way.
Looking at things similar to DRBD, I stumbled on GlusterFS. Gluster is a user-space filesystem that will allow you to replicate (as well as distribute, amongst other things) a directory across N number of servers. This looked like the perfect solution. That's when I got to thinking; these files, when requested... would they be served at the same speed as if they were being read from the local disk?
This is where it gets hairy. Per some folks in the #gluster irc channel, files can be read at random from the different servers, even when requested from the local volume. This means that a file on a non-localhost server could be chosen over a file in the localhost server's volume. This is a major problem.
So where do we stand?
Jeff Darcy from Redhat has already proposed a first patch. It's being discussed as a possible solution. Until a decision has been made, we just have to hope that nothing happens to that box. That could turn in to a very rough time.