Hot Column Addition and Deletion Part II: How it works
Hot Column Addition and Deletion (HCAD)
In the previous HCAD post, I described HCAD and showed that it can reduce the downtime of column addition (or deletion) from 18 hours to 3 seconds. In fact, the downtime of InnoDB is proportional to the size of the database, whereas the downtime for TokuDB 5.0 depends on the time it takes for MySQL to close and reopen a table — a time that’s independent of database size. Go ahead and build bigger tables. The HCAD downtime for TokuDB won’t increase.
You may be wondering how we do HCAD. Here goes:
Under the hood
TokuDB is based on Fractal Tree indexing, one of the cools features of which is that they replace random I/O with sequential I/O. The way this happens has an impact on how HCAD happens, so here’s the 20,000 foot view.
You can think of everything in a TokuDB table (or index) as a message. There can be messages to insert a row, or update a row, or delete a row. Rather than delivering these messages immediately, the messages are bundled up by common destination, and they progress towards the leaves of the Fractal Tree indexes when there are enough of them to make the disk-head movement worthwhile. They are applied in the right order, obviously, in order to keep the semantics of the SQL commands correct.
Even a query can be thought of as a message, except in the case of a query, it has to be delivered on the spot, even if that means moving the disk head. The query “sees” all the messages ahead of it, so it gets all the right answers, once again, according to the SQL semantics.
An HCAD command generates yet another type of message: it’s a broadcast message that needs to be applied to every row. Sticking the message into the Fractal Tree indexes is fast. In fact, the downtime associated with HCAD has nothing to do with the TokuDB work. Instead, MySQL closes and reopens a table on an alter table command. This causes dirty pages associated with that table to be flushed. The downtime is on the order of seconds, though we have seen extreme cases with many dirty pages and very large RAM where this can take a couple of minutes.
And since this is TokuDB, you also get to define clustering indexes, which reference all columns and therefore speed up lots of queries. The HCAD message gets injected into the primary table and into each clustering index.
The work of changing the rows does not happen when the HCAD message is injected. Rather, the broadcast message makes its way down to the leaves as other messages push it along. In this process, when an HCAD message reaches a row — either because other messages push it along or because of a query — the row gets rewritten to include the added column or exclude the delete column, as the case may be. Once the work is done to rewrite a row, the HCAD work is done for that row. The user can choose to have this work done immediately — say by a query that touches all rows — or lazily — as part of the normal operation of the database. Neither case involves downtime, and once the work is done to rewrite a row, the HCAD work is done for that row.
You can schedule the background work of updating the rows as follows. Recall that any query that touches a row, HCADs the row. A query of the form
SELECT COUNT (*) FORCE INDEX(X);
where X is primary or one of the clustering indexes, will touch each row and finish all the HCAD work for that index. I should emphasize that you don’t need to do this. Everything works fine without it. This
is an option if you want to get the work done all at once.
Trying it out
This is a 3 step process:
- Download TokuDB (free for evaluation purposes or in production up to 50GB of user data).
- Read section 3.4 of the User’s Guide.
- Load your tables and go!