TokuDB Hot Backup – Part 1
There are multiple ways to backup a MySQL database. Some are more painful than others. In this two part blog we are going to discuss why the new hot backup system in TokuDB is special amidst the existing solutions. First let’s look at existing backup solutions for MySQL and InnoDB.
Let’s start with the most obvious, and possibly painful, way to make a backup of MySQL: a manual copy of the MySQL data directory.
The copying itself isn’t the painful part; any DBA worth their salt can copy a directory. Guaranteeing what comes out the other end, however, is difficult. In other words, what will the state of each table in each database look like when the backup is complete? It turns out, without additional help, we don’t know!
If you think about the dynamic state of a database, and the serial copying of the same database files from one space to another, some questions may start to arise. Is there workload happening while I copy the files? Is the workload more than just client reads? If the answer is yes to this second question, then we have a problem: The files are changing as we copy them. The original and backup versions of each database file may not look the same once the copy finishes. Transactional consistency between the backup and the original cannot be guaranteed.
This problem occurs because there is a race between three logical components: the copying, the writes happening to the database files, and the writes happening to the log files. As the cp, scp or whatever-you-decide-to-use is reading, then writing, the bits in each file, the bits in the original file could change. If the new changed bits are ahead of your copy process, you are fine. The new bits will get copied and the two files will match at the end. The problem occurs when the changed bits occur in a part of a file you have already copied, or at the EXACT location you are copying. At the end of the copy, these bits will be different. These bits could be trivial, but we use databases for their consistency. If, say, a bit is on in a file, it better be on when we later read that bit.
So, to guarantee that no writes happen to either the database files or the log we can’t have any writes to any table. So, we shutdown the server, make our copy, then restart the server.
Obviously for many applications this is unacceptable. Downtime is one of the last things a web service with a MySQL back-end wants to have occur in the lifetime of the product.
MySQL offers a solution which guarantees consistency and avoids this manual shutdown and copy solution. The mysqldump utility, when run with the –single-transaction argument, allows users to create a consistent backup of all InnoDB tables on a MySQL server. This is definitely an improvement; users can still run read and write queries. However, all tables need to be scanned in their entirety which will be very disruptive to the cache on the server.
For some applications this will be acceptable. A common scenario is to run mysqldump during some maintenance period. The problem is that query performance may take a huge hit during this maintenance window, especially for large databases. Hence, the necessity for more sophisticated solutions that backup the data, consistently, but allow full read and write access without disrupting the server’s cache. The two “hot” backup solutions that come to mind are MySQL’s Enterprise Backup and Percona’s Xtrabackup.
Both solutions copy the raw bytes from each source file to each backup file. This is similar to the coarse copy and avoids ruining our cache. The key difference is how both solutions leverage InnoDB’s log to avoid the race between this copy and incoming writes to the database files. Essentially, they can copy and apply the log file to database files that have already been copied. This allows each solution to copy the files as fast as possible without regard to locking. Let’s see an example:
First, the backup process starts copying database file foo.
As it reaches the halfway point, a write occurs in the first half of the file representing foo.
Notice that the write didn’t make it to the backup file. The above picture is incomplete however. That sneaky write (changing A from 1 to 2) is actually happening in TWO places. The write is also happening in the log. (NOTE: The log illustrated below also contains entries for tables other than foo.)
The key is that the changes to foo are also stored in the log, and the log gets copied AFTER the database files are copied.
Once the backup finishes copying all the regular database files, we only need to do two things: copy the log then apply the log to all the database files.
InnoDB’s log structure uses entries that can be applied to the regular InnoDB database files over and over again. This means that even if a particular log entry is redundant (the backup copy already has the changes) it is harmless to re-apply it. Note: It does take time to apply the log, but not having to lock entire databases saves enough time to make this an attractive solution.
TokuDB’s log is different; we don’t use InnoDB’s log structure. Not every log entry can be repeatedly applied to the database files. Our UPDATE log entries perform an operation, such as increment, on the data in a field/record. If we applied the UPDATE multiple times, we could get a value in the backup file that is greater than the current source file.
Also, TokuDB has a unique and malleable mapping between the logical fractal tree layout and the serial layout on disk. In particular, the disk offsets of our compressed nodes often get reconfigured to optimize space. In effect, on restore from one of the two existing hot backup solutions, the layout of the fractal tree on disk could not be recreated from just replaying the log.
For these reasons, and until TokuDB v7, users had to use the less sophisticated, non-log aware solutions to backup their database. With the addition of the Enterprise Edition of TokuDB, users can now backup their data directory while the system is running, and end up with a transactionally consistent backup. We will detail the basic design of this new feature in next week’s blog.