Fractal Tree Indexing Overview

Posted On December 6, 2012 | By Martin Farach-Colton | 13 comments

We get a lot of questions about how Fractal Tree indexes work. It’s a write-optimized index with fast queries, but which write-optimized indexing structure is it?

In this ~15 minute video (which uses these slides), I give a quick overview of how they work and what they are good for.

13 thoughts

  1. [...] Martin Farach-Colton is doing review of fractal tree indexing and answers questions about how Fractal Tree indexes work. It’s a write-optimized index with fast queries. [...]

  2. alphazero says:

    (That isn’t really a “fractal”.)

    It would be good to include ‘cons’ at the last slide. For example, given that you are (effectively) caching a sub-set of your index (“buffer”), it would be good to discuss the D of ACID guarantees in context of server crashes

    1. Martin Farach-Colton says:

      @alphazero,

      Transactions in tokudb are durable. Yes, nodes in a Fractal Tree index are cached and modified, but this is also true of B-tree. The question of durability, and more importantly, durability performance, is addressed in our blogs here. Let me know if you have any other questions.

  3. [...] Fractal Tree Indexing Overview by Martin Farach-Colton. [...]

  4. [...] 9. TokuDB introduced innovative fractal tree indexing and improved MySQL query performance (fast ad-hoc query with write-optimized indexing structure ). Maximum utilization of SSDs http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/  [...]

  5. Tianshi says:

    I am wondering how the secondary index work? Eventually the data on disk can only be ordered by PK. Innodb secondary indexes use another btree to store PK. How does TokuDB implement the secondary key?

    Thanks!

    1. Martin Farach-Colton says:

      Thanks for the question Tianshi.

      TokuDB stores secondary indexes in Fractal Tree indexes, so they are ordered by whatever key defines them. If a secondary index is defined using the standard syntax, it will provide a mapping from secondary keys to PKs, just as InnoDB does.

      But TokuDB has a very useful alternative: any secondary key can be declared CLUSTERING, in which case that secondary index will store all fields of the table. There will be no need to do a join with the PK for queries that use a clustering secondary key. See here for more details, and here for other blog posts touching on this subject.

  6. Shashank says:

    What about this is “fractal”? I mean, it seems like all you are doing is adding buffering at non-leaf nodes of a B tree index.

    1. Hi Shashank,

      In a Fractal Tree index, each buffer (at an internal node) is recursively indexed. Furthermore, the leaves have an indexed structure (see this post for a discussion of Basement Nodes). So it’s not “fractal” in the sense of self-similarity at all levels, but we do have two different levels of indexing. Historically, we considered several other designs for the TokuDB data structure. Most of these were more “fractal” in that they were Cache Oblivious (CO). A CO data structure optimizes simultaneously for any possible parameter choice in the memory system, and thus optimizes at every level of the memory system. They are self-similar at many levels. So the use of “fractal” at this point is partly historical, and the remnant of this history is visible in the two levels of indexing. In short, it’s a marketing term.

  7. [...] Big Data, TokuMX replaces 40-year-old B-tree indexing technology with Tokutek’s patented Fractal Tree™ Indexing to vastly improve how MongoDB organizes and stores information. Because of its exceptional [...]

  8. Ansarul says:

    What are the read performance in case of Fractal Tree Indexing

Leave a Reply

Your email address will not be published. Required fields are marked *