Don’t Thrash: How to Cache your Hash on Flash
Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.
The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like a negative cache, letting you know that a particular element is not stored in a database. The problem is that Bloom filters don’t scale outside of RAM (they “thrash”). In the paper, the basis for my talk, we wrote about an alternative data structure that scales beyond main memory as data goes to SSDs.
The attendees at HotStorage comprised researchers/practitioners in storage technology for both MySQL and other databases. The attendees of ADS comprised mathematicians and theoretical computer scientists, studying algorithms and data structures. Since storage technologists, mathematicians and computer scientists often speak different languages, it was great to see that this time around they all responded to the material. The theoretical computer scientists and mathematicians commented on the elegance—but so did the storage practitioners. The storage practitioners praised the real horsepower and the application of SSDs—but so did the algorithmicists. From my perspective, it was gratifying to get this type of reaction. After all, it’s exactly this type of theoretical/practical conflux that got me excited about starting Tokutek six years ago – applying abstract algorithms to alleviate the all-too-real database bottlenecks our customers face every day.
To learn more about how this proposed alternative data structure supports over half a million insertions/deletions per second and over 500 point queries per second on a commodity flash-based SSD, please click here for the technical paper, slides, and mp3.