MySQL High Availability Alternatives
High service availability is critical for virtually all MySQL deployments. To achieve high service availability, MySQL deployment architectures use replication among servers (normally in a Master — Slave setup). This is done to
ensure service continuity during single points of failure and disaster recovery, and during routine administrative activities such as hardware and software upgrades, back-ups, schema changes, etc.
There are two fundamental approaches to MySQL replication and failover, implemented in several products:
- MySQL-Specific Replication, including loosely-coupled MySQL 5.1 asynchronous and 5.5 semi-synchronousreplication, tightly-coupled synchronous replication in Schooner MySQL with Active Cluster, Oracle MySQL Cluster; and
- MySQL External Replication, including Oracle Golden Gate, Continuent Tungsten, and Linux DRBD.
1.) MySQL-Specific Replication
Loosely-Coupled MySQL Asynchronous and Semi-Synchronous Replication
MySQL asynchronous and semi-synchronous replication are based on loose coupling between the Master and Slaves. As update transactions execute on the Master their statements or row data are written to a MySQL Bin Log and gradually transmitted to the Slaves. There the modifications are eventually
applied, replicating the modifications that were made on the Master. The Slaves operate independently from the Master, deciding how much to read and from which point in the Master Bin Log. The Slaves serially apply the changes using a single thread (to ensure ACID compliance). The Slaves do this while also servicing other read transactions (to provide read transaction scalability against the Master’s data).
With traditional MySQL asynchronous replication (in 5.1), the Master does not wait for the Slaves at all. With newer MySQL semi-synchronous replication (in 5.5) the Master waits for one Slave to acknowledge that it has received and queued (but not applied) the Bin Log data into its relay log before the Master completes the transaction.
Loosely-coupled asynchronous and semi-synchronous replication of the Bin Log hurt service availability, data integrity, performance, administration, and cost of ownership:
Reduced Service Availability
When a Master fails, fail-over to a Slave is stalled until all transactions in the relay logs have been committed
and a new Master established, and the remaining Slaves are reconfigured
Reduced Data Integrity
Since Slaves are arbitrarily behind the Master in reading and applying changes, Slaves can give old (stale) data in response to read transactions. Slaves may not have the latest committed data from the Master, so data can be lost when the Master fails. (The semi-synchronous replication in MySQL 5.5 provides a partial solution to this problem compared to MySQL 5.1 asynchronous replication, with one Slave having the committed data in memory.) Checksums in the binary and relay logs are not generated and persisted to permanent storage, making data corruption possible (to be addressed in MySQL 5.6).
Applying committed database transactions from the relay log on Slaves is single-threaded (so as to provide serial consistency). This results in low utilization of Slaves and low throughput; more Slaves are needed to
handle the required read transaction throughput levels, creating “server sprawl”. Master throughput must be limited to match the Slaves’ performance so the Slave databases are not too far out of date (otherwise the recovery time could be very long when the Master fails). This results in lower Master update transaction throughput, forcing additional database partitioning (sharding).
High Administrative Complexity
DBAs often bear a large burden of tedious, error-prone, and usually manual processes in common tasks such as recovery from a Master failure, Slave migrations or additions, and hardware or software upgrades.
Cost of Ownership
Low utilization of Masters and Slaves coupled with high administrative costs increase capital and operating expenses, while higher downtime reduces revenue and can cost customers.
With synchronous replication that is deeply integrated into MySQL and InnoDB, the cluster can realize complete data integrity, high performance,
cluster-wide consistency, and fast automated fail-over and recovery.
This is the architecture used in Schooner MySQL® with Active Cluster™. Multi-core and thread parallelism are used to concurrently communicate, replicate, and apply master update transactions on all Slaves with
extremely high throughput and low latency. When all updates are initiated on the Master, no two-phase commit is required. This improves performance and eliminates forced application roll-backs. When the Master commits a transaction, all Slaves are guaranteed to have received and committed the update.
Tightly-coupled synchronous replication can dramatically simplify fail-over and on-going administration:
When a Master failure is detected, fail-over can be automated and completed within a few seconds with no service interruption and no data loss.
Since the Master and all the Slaves are at the same consistency level, any Slave can be automatically promoted to become the new Master.
The Master’s VIPs (Virtual IP addresses) can be instantly and automatically switched to the newly promoted Slave, so updates continue to be processed without any service interruption. When a Slave failure is detected its load can be automatically switched and load-balanced to surviving nodes.
Once a failed Master or Slave is repaired, it can be automatically brought current with the Master and other Slaves, then automatically made active and load balanced.
The Cluster Administrator, through a graphical user interface (GUI) or a Command Line Interface (CLI), can provide a central point for easy and powerful administration, monitoring and tuning.
Hardware and software upgrades, on-line consistent back-ups, and instance migration can be accomplished with a simple point-and-click or a CLI command, all without service interruption.
Improved Service Availability
Tightly-coupled MySQL synchronous replication can provide much higher service availability than that achievable with asynchronous or semi-synchronous replication. For example, Figure 3 shows that Schooner
MySQL with Active Cluster reduces downtime by 85% or more compared to MySQL 5.5 or 5.6 asynchronous or semi-synchronous replication, through automated Master and Slave fail-over and the ability to do on-line
Improved Data Integrity
Tightly-coupled synchronous replication can completely eliminate the data loss and data inconsistencies present with MySQL asynchronous or semi-synchronous replication. With tightly-coupled synchronous replication:
• Reads on Slaves always provide the latest committed data, resulting in full cluster-wide data consistency
• Slaves always have the latest committed data from the Master, so there is no data loss if the Master fails
• Checksums on logs are persisted, so data corruption is detected and corrected.
Reduced Administrative Complexity
With tightly-coupled synchronous replication, fail-over and recovery can be completely automatic and instant, requiring no administrator intervention. Also, a synchronous replication Cluster Administrator GUI and CLI can
provide a single point for cluster-wide and per-instance management, monitoring, trouble-shooting, and tuning.
For example, the Schooner MySQL with Active Cluster Administrator supports single-click actions to perform hardware and software upgrades, slave migrations, provisioning, and concurrent on-line backups. The Schooner
MySQL with Active Cluster Administrator GUI, shown in Figure 4, allows simple yet powerful management of cluster nodes, replication groups, database instances, and on-going administration activities.
Tightly-coupled MySQL synchronous replication can increase performance dramatically over asynchronous or semi-synchronous replication when implemented with high thread, core and data structure parallelism. This is
shown in Figure 5, which compares Schooner MySQL with Active Cluster synchronous replication to MySQL 5.5 and its asynchronous and semi-synchronous replication.
2.) MySQL External Replication
In addition to MySQL–Specific replication, there are several solutions external to MySQL which address service
availability and data integrity issues. The most widely-used are:
- Oracle Golden Gate: This converts the MySQL asynchronous Bin Log to a common log format to provide heterogeneous database replication interoperability with Oracle, IBM DB2, and Microsoft SQL Server, or other MySQL instances. Since Golden Gate is a very loosely-coupled external replication service, its performance is significantly worse than that of MySQL 5.5 for MySQL Master – Slave environments. Since Golden Gate uses the MySQL asynchronous or semi-synchronous replication Bin Log, it is even more loosely-coupled. It therefore suffers from the same reduced service availability, poor data integrity, high administrative complexity, and high cost discussed above for MySQL 5.5 when used in MySQL Master -Slave deployments.
- Continuent Tungsten Replicator: This converts the MySQL asynchronous Bin Log to a transaction history log and uses JDBC through a client proxy to access MySQL indirectly. This approach enables heterogeneous database replication interoperability with PostgreSQL and other MySQL instances. Tungsten also provides a Global Transaction ID, which is used by Slaves to point to a new Master when an old Master fails. Tungsten has a SaaS & ISP capability allowing parallel replication across independent, multi-tenant MySQL databases, but has no parallel replication within a single database. Since Tungsten is a very loosely-coupled external replication service, its performance is significantly worse than that of MySQL 5.5 for MySQL Master – Slave environments. Since Tungsten uses the MySQL asynchronous replication Bin Log, it has the same problems of reduced service availability, poor data integrity, high administrative complexity, and high cost discussed above for MySQL 5.5 when used in MySQL Master -Slave deployments.
- Linux DRBD (Distributed Replicated Block Device): This provides active-passive mirroring at the block device level. After each commit, the stand-by server is guaranteed to have identical blocks on the device, so the stand-by storage is kept in lock-step with the Master. DRBD does prevent data loss, and limits downtime for Master failure since the stand-by Master can be typically restarted in minutes. But the stand-by server does not service any load, and all the Slaves are still operating with asynchronous or semi-synchronous replication, so the same issues of reduced service availability, poor data integrity, high administrative complexity, high cost, and poor performance discussed above for MySQL 5.5 Master – Slave deployments are still present.
The Schooner MySQL® with Active Cluster™ Advantage Schooner Information Technology created Schooner MySQL with Active Cluster (abbreviated Schooner MySQL) to solve the problems of other MySQL replication solutions. Schooner MySQL is a full distribution of MySQL Enterprise + InnoDB with multi-threaded fully synchronous replication. Schooner MySQL reduces downtime, eliminates data loss, ensures cluster-wide consistent data, provides instant and transparent fail-over, and dramatically simplifies administration, all with great performance and scalability. Schooner MySQL was created by Schooner based on the commercial MySQL and InnoDB source code, which Schooner licenses from Oracle. Schooner MySQL is an out-of-the-box software product that makes it easy for any MySQL shop to use MySQL in the most demanding mission-critical applications, with confidence. Schooner MySQL is downloadable to any X86 Linux commodity server, and is completely compatible with all existing MySQL applications and datasets.