Recovery Time for TokuDB

Posted On December 16, 2009 | By Tokutek | 3 comments

Last week Tokutek released version 3.0.0 of TokuDB, adding ACID transactions to its list of features. This post discusses an experiment we ran to measure recovery time following a system crash.

In summary, while actively inserting records into a MySQL database using iiBench, we compared the time to recover from a power-cord pull for both InnoDB and TokuDB.

Storage Engine Recovery Time
TokuDB 501s (8 min, 21 sec)
InnoDB 18505s (5 hours, 8 min, 25 sec)

This is by no means an exhaustive look at recovery performance, but does illustrate the benefits of Tokutek’s approach.

The experiment

We ran iiBench on the following server:

Sunfire x4150
Dual socket quad core Xeon 3.1GHz
16GiB memory
2 SAS 146GB
6 SAS 146GB hardware RAID 0 256KiB stripe
CentOS 5.1
Ext3 file system

The disks and RAID controller were configured with the write-caches OFF using arcconf. (Note: an experiment with the RAID controller write-cache ON resulted in a non-recoverable database, as you should expect)

We ran an insert-only iiBench test. After inserting about 250M rows and while actively inserting more, we manually pulled the plugs (2 on a Sunfire x4150). We then powered up the server and timed how long it took to start mysqld.

The results for InnoDB are not surprising – they are in line with results reported elsewhere. During recovery, InnoDB reported 1 transaction to be rolled back or cleaned up, and 68 row operations to undo.

The results for TokuDB are encouraging, in that they indicate recovery can be performed in minutes instead of hours. In this experiment, rebooting the server almost took as long as recovering the TokuDB database.

The primary reason for TokuDB’s quick recovery time is the use of regular, automatic checkpoints. This feature (introduced in version 2.0.0) ensures that work is completed and written to disk (including the necessary fsync) on a regular basis. Recovery is limited to processing uncommitted transactions and work since the last checkpoint. When not pushing performance to the absolute max we find recovery times typically take less than a minute.

3 thoughts

  1. oh come on, everyone knows that InnoDB has two low hanging fruits, that are relatively small patches, which would make recovery as fast as TokuDB’s.

    See http://mituzas.lt/2009/12/08/crash-recovery-again/ for example :)

    1. dave says:

      Domas – thanks for the pointer. You are correct. I re-ran the experiment, and it only took 1020s to recover with your patched version. Do you have any idea when these changes will be rolled into the standard product?

  2. [...] a follow-up experiment to an earlier post on TokuDB recovery times, I tried to create a better apples-to-apples comparison to InnoDB recovery [...]

Leave a Reply

Your email address will not be published. Required fields are marked *