In our last post, Bradley described how auto increment works in TokuDB. In this post, I explain one of our implementation’s big benefits, the ability to combine better primary keys with clustered primary keys.
In working with customers, the following scenario has come up frequently. The user has data that is streamed into the table, in order of time. The table will have a primary key that is an auto increment field, ‘id’, and then have an index on the field ‘time’. The queries the user does are all on some range of time (e.g. select sum(clicks) from foo where time > date ’2008-12-19′ and time < date '2008-14-20';).
For storage engines with clustered primary keys (such as TokuDB and InnoDB), having such a schema hurts query performance. Queries do a range query on a secondary index (time), and then perform point queries into the primary index. A better solution would be to start the primary key with the ‘time’ field. Then, all queries will be a range query on a clustered primary index. The problem users run into is that the ‘time’ field is not a unique field, and therefore cannot be defined as a primary key.
In TokuDB, a good solution is to take advantage of TokuDB’s auto increment semantics. Define the primary key to be (time, id), where ‘id’ is a an auto increment field. The combination (time, id) is guaranteed to be unique, therefore allowing it to be defined as a primary key. To demonstrate the benefits, let us take an example that is very similar to one used to demonstrate the benefits of partitioning in action. Suppose we had the following table:
create table no_part_tab (c1 int(11) default NULL, c2 varchar(30) default NULL, c3 date default NULL) engine=myisam;
And inserted 20M elements using the following commands:
mysql> delimiter // mysql> CREATE PROCEDURE load_part_tab() begin declare v int default 0; while v < 20000000 do insert into no_part_tab values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 7304)); set v = v + 1; end while; end // Query OK, 0 rows affected (0.00 sec) mysql> delimiter ; mysql> call load_part_tab(); mysql> optimize table no_part_tab;
Running the following query,
select sum(c1) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-6-31';
takes 7.25s. It takes this long because a table scan is required to answer the query.
If we were to add an index on the 'c3' field, then the query takes 2.36s. Because MyISAM does not cluster its primary index, defining a primary key of (c3, id) where 'id' is an auto-increment field does not help: the query still takes 2.36s in this case.
Now suppose we had a TokuDB table, with the following schema:
create table toku_tab (c1 int(11), id int auto_increment, c2 varchar(30) default NULL, c3 date default NULL, primary key (c3,id)) engine=tokudb;
We took the elements in no_part_tab, and inserted them into toku_tab, allowing the auto increment field to be automatically generated. Running the query on this table takes 0.30s.
Because we had the flexibility to create a primary key that better fits the desired use of the table and could combine with this with clustering, we are able to achieve better performance. This is one of the nice benefits to our auto increment semantics.