A common migration path from standalone MySQL/Percona Server to a Percona XtraDB Cluster (PXC) environment involves some measure of time where one node in the new cluster has been configured as a slave of the production master that the cluster is slated to replace. In this way, the new cluster acts as a slave of the production environment – traditional replication takes care of getting the data into the cluster, and then Galera replication handles the intra-cluster traffic. This often works without issue, although there is one case that I’ve encountered recently where special care must be taken to properly configure the stream to ensure that replication does not break. If you use multi-valued inserts with auto-increment columns, then this post is for you.
For purposes of our discussion, assume that we have a basic 3-node PXC cluster that we’ve set up using the PXC Reference Architecture document, and that we’re replicating from an asynchronous master (call it “server A”) into one of the PXC nodes. Without loss of generality, we’ll pick PXC03. Also, for purposes of our discussion, we’ll be working with the following table definition:
serverA> show create table auto_inc_test; CREATE TABLE `auto_inc_test` ( `i` int(11) NOT NULL AUTO_INCREMENT, `stuff` varchar(20) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) serverA> SELECT * FROM auto_inc_test; Empty set (0.00 sec)
If we insert rows into this table one at a time, we have no issues.
serverA> INSERT INTO auto_inc_test(stuff) VALUES ('first row'); serverA> INSERT INTO auto_inc_test(stuff) VALUES ('second row'); serverA> INSERT INTO auto_inc_test(stuff) VALUES ('third row'); serverA> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 3 | third row | +---+------------+ PXC03> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 3 | third row | +---+------------+
But if we start doing multi-valued inserts, we can run into a problem.
serverA> INSERT INTO auto_inc_test(stuff) VALUES('first row'),('second row'),('third row'); serverA> INSERT INTO auto_inc_test(stuff) VALUES('fourth row'),('fifth row'); serverA> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 3 | third row | | 4 | fourth row | | 5 | fifth row | +---+------------+ PXC03> SELECT * FROM auto_inc_test; +---+------------+ | i | stuff | +---+------------+ | 1 | first row | | 2 | second row | | 5 | third row | +---+------------+ PXC03> SHOW SLAVE STATUS; ... output elided ... Last_SQL_Errno: 1062 Last_SQL_Error: Error 'Duplicate entry '5' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'INSERT INTO auto_inc_test (stuff) VALUES ('fourth row'),('fifth row')' ... output elided ...
Uh oh. Replication is broken and our data is now inconsistent. So why does this happen and how can we prevent it?
binlog_format
The astute observer will note that I have not yet said anything about the binary log format on the master. If the binary log format on the master is already set to ROW, then the above error will not occur. RBR will properly replicate multi-valued INSERTs to the PXC cluster without issue, and the data will be consistent. Problem solved. However, there may be reasons that the master is not using or cannot use RBR, such as disk space or IOPS limitations, and thus it’s running in MIXED or STATEMENT mode. In that case, we need to look elsewhere….
wsrep_auto_increment_control
When set to ON (the default), this variable has the effect of automatically specifying values for auto_increment_increment and auto_increment_offset based on the cluster size. The idea behind it is to help prevent auto-increment value conflicts when writing to multiple nodes. However, what it also means is that in a multi-node cluster, the auto-increment values generated by any given node will never be consecutive and the “next” auto-increment value on the slave cluster node will always be higher than what the master believes it should be. For example:
serverA> INSERT INTO auto_inc_test (stuff) VALUES ('first row'),('second row'),('third row'); serverA> SHOW CREATE TABLE auto_inc_test; CREATE TABLE `auto_inc_test` ( `i` int(11) NOT NULL AUTO_INCREMENT, `stuff` varchar(20) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1 PXC03> SHOW CREATE TABLE auto_inc_test; CREATE TABLE `auto_inc_test` ( `i` int(11) NOT NULL AUTO_INCREMENT, `stuff` varchar(20) DEFAULT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1
Hmm, that’s not good.
The above scenario can be avoided by setting wsrep_auto_increment_control to OFF on the node that’s acting as a slave while the cluster is still receiving asynchronous replication traffic. This can be configured in /etc/my.cnf or set dynamically from the MySQL command line with:
SET GLOBAL wsrep_auto_increment_control='OFF';
In the testing that I’ve done, this appears to be sufficient to deal with the problem, even if the original master has non-standard values specified for the auto_increment_* variables (e.g., because it’s part of a master-master pair where the odd ID numbers are generated by one server and the even ID numbers from the other one).
In fact, if the cluster is always going to be used in single-writer mode, there’s a compelling argument to be made for setting this variable to OFF on all of the cluster nodes even when going into production: it will prevent the cluster from burning through N auto-increment IDs every time a single row is inserted.
Fixing it when it’s broken
There’s no secret magic here. Percona Toolkit’s pt-table-checksum and pt-table-sync can be used to check and repair the data divergence between the master and the slave cluster node, just as if PXC were not involved, although you may find that it’s just faster to rebuild the cluster, depending upon how many diffs are found. Sometimes this issue happens right away when the master-to-cluster replication is started, and fixing it involves just one or two rows; other times I have seen it not occur for days with lots of rows out of sync once it finally breaks.
The tl;dr version
In sum, if you’re doing traditional replication into a PXC cluster (or any other flavor of MySQL/MariaDB + Galera), you may have issues with multi-valued INSERT statements, and this can be prevented with a configuration change on either side of the replication stream:
- On the master, set binlog_format=ROW
- On the PXC slave node, set wsrep_auto_increment_control=OFF
The post Multi-Valued INSERTs, AUTO_INCREMENT & Percona XtraDB Cluster appeared first on MySQL Performance Blog.