Better Primary Keys, a Benefit to TokuDB’s Auto Increment Semantics

In our last post, Bradley described how auto increment works in TokuDB. In this post, I explain one of our implementation’s big benefits, the ability to combine better primary keys with clustered primary keys.

In working with customers, the following scenario has come up frequently. The user has data that is streamed into the table, in order of time. The table will have a primary key that is an auto increment field, ‘id’, and then have an index on the field ‘time’. The queries the user does are all on some range of time (e.g. select sum(clicks) from foo where time > date ’2008-12-19′ and time < date '2008-14-20';).

For storage engines with clustered primary keys (such as TokuDB and InnoDB), having such a schema hurts query performance. Queries do a range query on a secondary index (time), and then perform point queries into the primary index. A better solution would be to start the primary key with the ‘time’ field. Then, all queries will be a range query on a clustered primary index. The problem users run into is that the ‘time’ field is not a unique field, and therefore cannot be defined as a primary key.

In TokuDB, a good solution is to take advantage of TokuDB’s auto increment semantics. Define the primary key to be (time, id), where ‘id’ is a an auto increment field. The combination (time, id) is guaranteed to be unique, therefore allowing it to be defined as a primary key. To demonstrate the benefits, let us take an example that is very similar to one used to demonstrate the benefits of partitioning in action. Suppose we had the following table:

create table no_part_tab
(c1 int(11) default NULL,
c2 varchar(30) default NULL,
c3 date default NULL) engine=myisam;

And inserted 20M elements using the following commands:

mysql> delimiter //
mysql> CREATE PROCEDURE load_part_tab()
  begin
   declare v int default 0;
           while v < 20000000
   do
   insert into no_part_tab
   values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520) mod 7304));
   set v = v + 1;
   end while;
  end
  //

Query OK, 0 rows affected (0.00 sec)
mysql> delimiter ;
mysql> call load_part_tab();
mysql> optimize table no_part_tab;

Running the following query,

select sum(c1) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-6-31';

takes 7.25s. It takes this long because a table scan is required to answer the query.

If we were to add an index on the 'c3' field, then the query takes 2.36s. Because MyISAM does not cluster its primary index, defining a primary key of (c3, id) where 'id' is an auto-increment field does not help: the query still takes 2.36s in this case.

Now suppose we had a TokuDB table, with the following schema:

create table toku_tab
(c1 int(11),
id int auto_increment,
c2 varchar(30) default NULL,
c3 date default NULL,
primary key (c3,id)) engine=tokudb;

We took the elements in no_part_tab, and inserted them into toku_tab, allowing the auto increment field to be automatically generated. Running the query on this table takes 0.30s.

Because we had the flexibility to create a primary key that better fits the desired use of the table and could combine with this with clustering, we are able to achieve better performance. This is one of the nice benefits to our auto increment semantics.

No Tags.

One Response to Better Primary Keys, a Benefit to TokuDB’s Auto Increment Semantics

  1. Kristian Nielsen says:

    My standard trick to do this in InnoDB (or other databases) is this:

    CREATE TABLE t (
    c1 int,
    c2 varchar(30),
    c3 date,
    idx smallint,
    PRIMARY KEY (c3, idx)
    );

    Then you insert rows like this:

    INSERT INTO t (c1,c2,c3,idx) SELECT 42, “my c2 value”, “2007-01-10″, IFNULL(MAX(idx)+1, 0) FROM t WHERE c3 = “2007-01-10″

    (Don’t get fooled by the INSERT … SELECT; this inserts a single row with constant values except for the `idx` column, which gets the next number in sequence).

    This gets the same clustered index storage without the need for an autoincrement key.

    An advantage is that values of `idx` counts from 0 for each distinct date value, so data looks nicer and may be able to use a smaller integer size to save space and index btree size. A disadvantage is the need to compute the MAX() for inserts, so bulk insertion is more complicated.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>