How We Spent a Tuesday Fixing a MySQL Replication Bug

We found a simple XA transaction that crashes MySQL 5.5 replication. This simple transaction inserts a row into an InnoDB table and a TokuDB table. The bug was caused by a flaw in the logging code exposed by the transaction’s use of two XA storage engines (TokuDB and InnoDB). This bug was fixed in the TokuDB 6.0.1 release.

Here are some details.  Suppose that a database contains the following tables.

create table t1 (a int) engine=InnoDB
create table t2 (a int) engine=TokuDB

 The following transaction

begin
insert into t1 values (1)
insert into t2 values (2)
commit

causes the replication slave to crash.

The crash occurs when mysqld tries to dereference a NULL pointer.

#4  0x000000000088e203 in MYSQL_BIN_LOG::log_and_order (this=0x14b8640, thd=0x7f7758000af0, xid=161, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/mariadb-5.5.25/sql/log.cc:7491
7491      cache_mngr->using_xa= TRUE;
(gdb) p cache_mngr
$1 = (binlog_cache_mngr *) 0×0

We posted a description of the problem to the MySQL and MariaDB developers internals email lists and received some very helpful feedback.  The bug fix is to create the binlog_cache_mngr object if it has not yet been created in the log_and_order method and other similar places in the logging code.  Our Mariadb 5.5 patch can be found on launchpad in the  lp:~prohaska7/5.5-xa-rpl-crash-fix branch.

Tags: , , , , , .

4 Responses to How We Spent a Tuesday Fixing a MySQL Replication Bug

  1. That is great, but XA still isn’t replication safe. It says so right in the manual.

  2. This seems to be a normal transaction (BEGIN instead of XA START). It is probably an internal XA transaction between the storage engine and the binlog.

    Does it only happen with replication or also with binlogs (e.g. for point-in-time recovery)?

    If it’s a real XA transaction, does XA RECOVER work after a restart? (the transaction must be prepared state of course)

    • Rich Prohaska says:

      When MySQL commits an transaction that involves > 1 XA storage engine, it uses the 2 phase commit protocol in the commit. MySQL refers to this an an internal XA transaction. So, there are prepares to all of the storage engines followed by commits to all of the storage engines. If the MySQL binary log is enabled, transactions involving at least 1 XA storage engine also use a 2 phase commit protocol. The transaction is prepared in storage engines, the transaction is logged in the binlog, and finally the transaction is committed in the storage engines.

  3. Pingback: MariaDB FAQ CN | MySQLOPS 数据库与运维自动化技术分享

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>