How We Spent a Tuesday Fixing a MySQL Replication Bug
We found a simple XA transaction that crashes MySQL 5.5 replication. This simple transaction inserts a row into an InnoDB table and a TokuDB table. The bug was caused by a flaw in the logging code exposed by the transaction’s use of two XA storage engines (TokuDB and InnoDB). This bug was fixed in the TokuDB 6.0.1 release.
Here are some details. Suppose that a database contains the following tables.
create table t1 (a int) engine=InnoDB
create table t2 (a int) engine=TokuDB
The following transaction
insert into t1 values (1)
insert into t2 values (2)
causes the replication slave to crash.
The crash occurs when mysqld tries to dereference a NULL pointer.
#4 0x000000000088e203 in MYSQL_BIN_LOG::log_and_order (this=0x14b8640, thd=0x7f7758000af0, xid=161, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/mariadb-5.5.25/sql/log.cc:7491
7491 cache_mngr->using_xa= TRUE;
(gdb) p cache_mngr
$1 = (binlog_cache_mngr *) 0x0
We posted a description of the problem to the MySQL and MariaDB developers internals email lists and received some very helpful feedback. The bug fix is to create the binlog_cache_mngr object if it has not yet been created in the log_and_order method and other similar places in the logging code. Our Mariadb 5.5 patch can be found on launchpad in the lp:~prohaska7/5.5-xa-rpl-crash-fix branch.