TokuDB speeds up “replace” and “insert ignore” operations by relaxing the affected rows constraint

Posted On August 3, 2010 | By Rich Prohaska | 2 comments

In posts on June 30 and July 6, we explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, we hinted at that there are some caveats that complicate the story a little. In this post, we explain one of the complications: the calculation of affected rows.

MySQL returns the number of rows affected by a “replace” or “insert” statement to the client. For the “replace” statement, the number of affected rows is defined to be the sum of the number of rows deleted and the number of rows inserted. For example, if we replace a row whose key does not exist in the table, then the number of affected rows is one. However, if the key does exist in the table, then it must be deleted and then inserted. In this case, the number of affected rows is two.

Executing a “replace” statement on a B-tree with an ad-hoc key may cause disk seeks to read the sub-tree that contains the row. Once the leaf block is in memory, it is easy to compute the number of affected rows, but the cost of the “replace” operation is high due to the disk seeks.

TokuDB fractal trees can avoid the disk seeks, as noted in the previous blogs. A “replace” message is inserted near the root of the fractal tree. No disk seeks are necessary since the blocks near the root of the tree are usually in memory. As a result, fractal trees can achieve much higher performance than B-trees for this operation. Unfortunately, the calculation of affected rows hinders performance. To compute the number of rows deleted, we need to see if the row already exists in the fractal tree, and this may cause disk seeks to read the sub-tree that contains the row. If we want high performance, we need to relax the definition of affected rows for “replace” operations.

Many applications replace a row in the table, and do not look at the value of the affected rows that is returned to them. For these applications, why whould one be willing to lose 2 orders of magnitude of performance computing a value that is not even used? TokuDB uses a session variable, called “tokudb_pk_insert_mode”, to control how the affected rows is computed. We can compute affected rows as defined by MySQL, and be as slow as a B-tree, or we can turn off the computation and be really fast. The application can decide what semantics it wants.

There are similar issues with how the computation of affected rows can destroy “insert ignore” performance. For the “insert ignore” statement, the number of affected rows is defined to be the number of rows inserted. If the row already exists in the table, the number of affected rows is zero. Otherwise, the number of affected rows is one. A B-tree needs to lookup the key in the tree, and for ad-hoc keys, this may result in disk seeks to read the leaf block that may contain the key. If the key does not exist, then the row is inserted. Since the leaf node must be read into memory, the computation of affected rows is easy for B-trees, but the cost of the “insert ignore” operation is high due to the disk seeks.

TokuDB fractal trees can avoid the disk seeks, as noted in the previous blogs. An “insert ignore” message is injected near the root of the fractal tree. No disk seeks are necessary since the blocks near the root of the tree are usually in memory. As a result, fractal trees can achieve much higher performance than B-trees for this operation. Unfortunately, the definition of affected rows for “insert ignore” hinders peformance. For “insert ignore”, the row is inserted only if the key does not already exist in the table. The computation of affected rows requires us to know whether or not the row already exists in the table. This requires a lookup in the tree, and the tree lookup may cost one or more disk seeks to read the sub-tree that may contain the row. TokuDB uses the session variable described previously to give applications control over how the affected rows is computed.

Applications may use “replace” to replace a row in a table, or “insert ignore” to insert a row in a table if it does not already exist in the table. The definition of how the affected rows is computed for these operations may cause hidden disk seeks. These hidden disk seeks may destroy the performance. TokuDB relaxes the affected rows constraint on these operations and as a result gains almost 2 orders of magnitude performance gain. As expected, TokuDB also has a session variable that controls how affected rows is computed for those applications that need compatibility with the specification.B-trees are slow because of the replace or insert disk seeks that have nothing to do with the details of the affected row computation. They are equally slow with either semantics.

2 thoughts

  1. […] août 2010, TokuDB a introduit tokudb_pk_insert_mode permettant d’accélérer sensiblement la vitesse des insert ignore / replace / delete […]

  2. […] us that a limitation of TokuDB based on the chosen replication mode. In August 2010, TokuDB introduced tokudb_pk_insert_mode, which allows you to significantly increase the speed of insert ignore/replace/delete ignore. […]

Leave a Reply

Your email address will not be published. Required fields are marked *