An Updated Description of Clustering Keys for TokuDB

Posted On August 6, 2014 | By Rich Prohaska | 0 comments

Covering indexes can result in orders of magnitude performance improvements for queries. Bradley’s presentation on covering indexes describes what a covering index is, how it can effect performance, and why it works. However, the definition of a covering index can get cumbersome since MySQL limits the number of columns in a key to 16 (32 on MariaDB).

Tokutek introduced multiple clustering indexes into MySQL to address these problems. Zardosht describes the multiple clustering indexes feature and how clustering indexes differ from covering indexes. Zardosht also describes the query versus update tradeoffs that exist with clustering indexes.

TokuDB now runs in MySQL 5.5, MariaDB 5.5, Percona Server 5.6, and MariaDB 10, and multiple clustering indexes are supported on all of these platforms. However, the user interface for using clustering indexes is slightly different. We describe the user interface differences here.

Tokutek’s user interface for clustering indexes
Tokutek added the clustering attribute to the key definition and extended the MySQL and MariaDB grammar to use it. Tokutek also tweaked the query optimizer to understand that clustering indexes cover all of the columns defined in the table.

Here is how a TokuDB table with a secondary clustering index is created:

mysql> create table t (a int, b int, c int, key(a), clustering key(b)) engine=tokudb;

mysql> show create table t;
CREATE TABLE `t` (
    `a` int(11) DEFAULT NULL,
    `b` int(11) DEFAULT NULL,
    `c` int(11) DEFAULT NULL,
    KEY `a` (`a`),
    CLUSTERING KEY `b` (`b`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1

Here is how a clustering index is added to an existing TokuDB table:

mysql> create clustering index c on t(c);

mysql> show create table t;
CREATE TABLE `t` (
    `a` int(11) DEFAULT NULL,
    `b` int(11) DEFAULT NULL,
    `c` int(11) DEFAULT NULL,
    KEY `a` (`a`),
    CLUSTERING KEY `b` (`b`),
    CLUSTERING KEY `c` (`c`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1

Percona Server’s user interface for clustering indexes
The user interface for clustering indexes in Percona Server has the same grammar as Tokutek’s. However, Percona decided to not allow secondary clustering indexes to be created on InnoDB, which does not support them yet. In contrast, Tokutek’s implementation silently ignores the clustering keywords for InnoDB.  Other than this, Tokutek’s implementation and Percona’s implementation are the same.

Percona Server will not create an InnoDB table with a secondary clustering index.

mysql> create table t (a int, b int, c int, clustering key(a)) engine=innodb;
ERROR 1478 (HY000): Table storage engine ‘InnoDB’ does not support the create option ‘CLUSTERING’

Percona Server will not add a secondary clustering index to an InnoDB table because InnoDB does not support it.

mysql> create table t (a int, b int, c int, key(a)) engine=innodb;
Query OK, 0 rows affected (0.13 sec)

mysql> create clustering index b on t(b);
ERROR 1478 (HY000): Table storage engine ‘InnoDB’ does not support the create option ‘CLUSTERING’

MariaDB’s user interface for clustering indexes
When the MariaDB engineers ported TokuDB to MariaDB, they decided to not extend the grammar to support the clustering attribute, but rather to have TokuDB define a clustering index attribute. Index attributes is a feature specific to MariaDB. As a consequence, the user interface for clustering indexes in MariaDB’s port of TokuDB is different than Tokutek’s.

Here is how a TokuDB table is created with a clustering index on ‘b’.

MariaDB [test]> create table t (a int, b int, c int, key(a), key(b) clustering=yes) engine=tokudb;

Note that the clustering attribute is stored for the index on ‘b’.

MariaDB [test]> show create table t;
CREATE TABLE `t` (
    `a` int(11) DEFAULT NULL,
    `b` int(11) DEFAULT NULL,
    `c` int(11) DEFAULT NULL,
    KEY `a` (`a`),
    KEY `b` (`b`) `clustering`=yes
) ENGINE=TokuDB DEFAULT CHARSET=latin1 `compression`=’tokudb_zlib’

Here is how to add a clustering index on ‘c’ to an existing table:

MariaDB [test]> create index c on t(c) clustering=yes;

MariaDB [test]> show create table t;
CREATE TABLE `t` (
    `a` int(11) DEFAULT NULL,
    `b` int(11) DEFAULT NULL,
    `c` int(11) DEFAULT NULL,
    KEY `a` (`a`),
    KEY `b` (`b`) `clustering`=yes,
    KEY `c` (`c`) `clustering`=yes
) ENGINE=TokuDB DEFAULT CHARSET=latin1 `compression`=’tokudb_zlib’

For some reason, the parser accepts ‘clustering=yes|no|1|0′, but does not accept ‘clustering=y|n|true|false’.

MariaDB [test]> create index c on t(c) clustering=y;
ERROR 1912 (HY000): Incorrect value ‘y’ for option ‘clustering’

Portability issues
The meta-data to store Tokutek’s implementation of multiple clustering indexes and Maria’s implementation is different. This makes portability between the two systems problematic without a conversion tool. I like Maria’s implementation of index and table options as it allows the storage engine to define new capabilities without hacks to the common MySQL or MariaDB code. It would be nice if MySQL would also support this feature.

Leave a Reply

Your email address will not be published. Required fields are marked *