In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the story a little. In this post, I explain one of the complications: secondary indexes.
Secondary indexes act the same way in TokuDB as they do in InnoDB. They store the defined secondary key, and the primary key as a pointer to the rest of the row. So, say the table foo has the following schema:
create table (a int, b int, c int, primary key (a), key(b));
And we did:
insert into foo values (1,10,100),(2,20,200);
Logically, there is one dictionary that stores all the data (this is the clustered primary key). Let us call it the main dictionary:
key value 1 10,100 2 20,200
And there is another dictionary for the secondary key that stores the column ‘b’ and the primary key, ‘a’:
key value 10 1 20 2
For secondary indexes to work properly, there must be a one to one correspondence between elements in the secondary index and in the primary index. If this correspondence is broken, then the table is corrupt.
Now suppose we were to execute:
replace into foo values (1,1000,1000);
- in main dictionary, overwrite the value of key ’1′ and value ’10,100′ with key ’1′ and value ’1000,1000′.
- in secondary dictionary, remove the key ’10′ with value ’1′.
- in secondary dictionary, insert the key ’1000′ and key ’1′.
Notice that we cannot perform the second step unless we know the content of the existing row that is being replaced. Learning the content of the existing row requires a lookup in the main dictionary, which incurs a disk seek.
So, when executing “replace into” or “insert ignore” on tables with secondary keys, all engines must still incur a disk seek on the primary dictionary to learn where associated elements are in a secondary index, whereas if no secondary keys exist, then TokuDB’s fractal trees can avoid this disk seek.
Even with secondary indexes, fractal tree indexes are preferred. B-trees still incur additional disk seeks on insertions into secondary indexes that fractal trees do not. However, with no secondary indexes, fractal trees can do away with the mandatory disk seek whereas B-trees do not.