Disk seeks are evil, so let’s avoid them, pt. 4

Posted On June 18, 2010 | By Zardosht Kasheff | 2 comments

Continuing in the theme from previous posts, I’d like to examine another case where we can eliminate all disk seeks from a MySQL operation and therefore get two orders-of-magnitude speedup. The general outline of these posts is:


  • B-trees do insertion disk seeks. While they’re at it, they piggyback some other work on the disk seeks. This piggyback work requires disk seeks regardless.
  • TokuDB’s Fractal Tree indexes don’t do insertion disk seeks. If we also get rid of the piggyback work, we end up with no disk seeks, and a two order of magnitude improvement.

So it’s all about finding out which piggyback work is important (important enough to pay a huge performance penalty for), and which isn’t.

This blog post is about one of the most straightforward operations: ad-hoc insertions with a primary key. Since the difference I’ve identified between B-tree indexes and Fractal tree indexes is the disk seeks on insertions, it may seem that there’s little to look at in this case. But the semantics of insertion into a primary key is as follows: if the primary key being inserted does not already exist in the table, insert the element, otherwise return an error notifying the user that a duplicate key exists.

So, if we have a table:

create table foo (a int, b int, primary key (a))

and we run:

insert into foo values (1000,1);

Before inserting (1000,1) into the table, the storage engine must first check to see if any row in the table where a=1000 exists. If so, return an error stating that there is a duplicate key. The requirement to return an error is VERY expensive: it requires a disk seek.

The only way a disk based storage engine can verify if a key exists is to look up the element. This is the same reasoning used for unique secondary keys here. If the primary key is ad-hoc, the lookup requires disk seeks. And disk seeks are evil.

Once again, B-trees already pay for the disk seek just for the insertion. Because the leaf node must be in memory to do the
insertion, verifying uniqueness essentially comes for free. So B-trees suffer from disk seeks, whether you do duplicate checking or not.

So how can users achieve fast performance with a Fractal tree based storage engine — as compared to B-trees, which will be slow regardless? Remove this requirement of returning an error if a duplicate is key found. It is way too expensive. MySQL has syntax to perform other actions instead of returning an error:


  • replace into
  • insert ignore
  • insert … on duplicate key update

The question is, do these commands do piggyback work that requires disk seeks? If they do, performance will be slow for ANY disk based storage engine. If not, then fractal tree data structures will be achieve up a speedup of up to two orders of magnitude. In next week’s post, I examine these cases.

2 thoughts

  1. […] this post two weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key […]

  2. […] my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because […]