Disk seeks are evil, so let’s avoid them, pt. 2

Posted On May 25, 2010 | By Zardosht Kasheff | 1 comments

In part 1, I discussed why having many disk seeks are bad (they slow down performance), and how fractal tree data structures minimize disk seeks on ad-hoc insertions, whereas B-trees practically guarantee that disk seeks are performed on ad-hoc insertions. As a result, fractal tree data structures can insert data up to two orders of magnitude faster than B-Trees can.

Now that insertion disk seeks are out of the way (and I don’t want to shortchange the importance of getting rid of these seeks!), let’s look at other places where databases perform seeks, and see if we can get rid of them. Over my next couple of posts, I will look at several use cases and analyze whether disk seeks are required. If disk seeks are required, then performance will suffer on large amounts of data, for TokuDB and any other disk-based storage engines.

If disk seeks are not required, things get interesting. Removing these unnecessary disk can speed up a database as long as all disk seeks in a command execution are removed. Since TokuDB eliminates seeks on insertions, we should avoid disk seeks altogether. Since B-trees induce disk seeks on ad-hoc insertions, cleaning up the remaining disk seeks has limited utility.

For today, let’s look at a simple use case that may be obvious: insertions on secondary indexes v. unique secondary indexes. Take the following table:

Create Table: CREATE TABLE `t` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `b` int(11) DEFAULT NULL,
  `c` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`),
  UNIQUE KEY `b_unique` (`b`),
  KEY `b_norm` (`b`)
)

Suppose most of the table resides on disk.

Now I run:

insert into t (b) values (1000);

Are there any mandatory disk seeks involved?

When inserting into the fractal tree for the primary dictionary, we use an auto increment value, so insertions are sequential. Insertions run really fast, because a disk seek is usually not mandatory (disk seeks eventually happen when blocks get full, but they do not occur on EACH insertion).

Let’s look at inserting into b_unique and b_norm visually. Take the following identical fractal trees for ‘b_unique’ and ‘b_norm':

-

- -

- - - -

...

1, 3, 5, 7, ..., 999, 1001, 1003, ...

To insert into ‘b_norm’, the fractal tree can simply insert (1000) in the top node. To insert into ‘b_unique’, the fractal tree must first search for 1000, verify that it is not between 999 and 1001, and then insert into the top node. This lookup causes a disk seek and slows down the insertion.

Note that because B-trees require disk seeks to do insertions anyway, some operations come with no additional cost in B-trees. Uniqueness checks are one such example. As a result, some B-tree users may not think twice about making secondary keys unique (after all, unique keys can help the query optimizer). Fractal tree data structures, on the other hand, incur a huge cost for a uniqueness check.

So, the moral of this story is if you care about insertion performance and avoiding disk seeks, try to avoid unique secondary keys, and go with normal secondary keys. Otherwise, fractal tree data structures will be just as slow as B-trees, and not two orders of magnitude faster.

One Comment

  1. [...] a key exists is to look up the element. This is the same reasoning used for unique secondary keys here. If the primary key is ad-hoc, the lookup requires disk seeks. And disk seeks are [...]

Leave a Reply

Your email address will not be published. Required fields are marked *