So I was thinking about maybe doing MySQL fault tolerance and load balancing through hardware load balancers by setting up a virtual cluster for database reads and another for database writes. We could setup 2 master servers in a circular replication, making sure you only actually write to one at a time (define one as a hot spare in the load balancer’s “DB write cluster”, then the backup master takes over writes only if the primary master is down). That part is no problem… I don’ t have any questions there. 🙂
Now, let’s say we have three MySQL slave DB servers that read from the master through the load balancer’s “DB write” connection (again, they would only be reading from one at a time). But what I want to know is what would happen to the slave servers if you fail over the master -> slave connection to a different physical master server. Is this going to cause problems with the MASTER_LOG_POS position on the slaves if they fail over to a different master server?
Does anyone know much about the inner workings of MySQL’s master/slave setup and what happens in the event of a MASTER_LOG_POS conflict? I’m just thinking it would be nice to have a truly redundant database setup that was handled 100% through hardware (load balancers).
If no one knows, I guess that will be something I’ll be testing later this week. 🙂
That’s more skills than I have.
Hey Shawn,
I should warn you that I haven’t configured a multi-master MySQL configuration yet, just multi-slaves, and everything about your topic is based on the dox.
You didn’t mention a particular MySQL version, so I’ll refer to 4.1. There are a couple of issues with the load balancer setup, including when one (or more) of the slaves has more data than the secondary master (unlikely, yet not impossible). The closest you can come to no downtime fixing things and bringing DB’s up-to-date manually is to have shared storage between the two masters, the second one being on standby (of course, this means no replication ring). When primary fails, you start up MySQL on the secondary with the same server-id and the setup takes off just as if nothing happened. This can even be scripted, depending on the operating system you use.
I know, MySQL is a pain the butt when it comes to true HA, and it seems that only in 5.1 things are beginning to go in the right direction. Till then…
It’s MySQL 5.0.x…
The first part (the dual master part) seems to work fine… two masters in a circular replication setup. MySQL clients and slaves communicate to the master through the load balancer (load balancer setup so the 2nd master is a “hot spare”, so it only get written to when the primary master is unavailable. No problems there…
The problem is when you try to hang the slaves off the load balancer and the load balancer fails over to the 2nd master automatically. The slaves get confused on what just happened and they stop replicating.
So for a 2 server setup, it seems to work fine, but for a 3+ server setup, it doesn’t seem to work so hot…