Thoughts on Small Datum – Part 1

A little background…

When I ventured into sales and marketing (I’m an engineer by education) I learned I would often have to interpret and simply summarize the business value that is sometimes hidden in benchmarks. Simply put, the people who approve the purchase of products like TokuDB® and TokuMX™ appreciate the executive summary.

Therefore, I plan to publish a multipart series here on TokuView where I will share my simple summaries and thoughts on business value for the benchmarks Mark Callaghan (@markcallaghan), a former Google and now Facebook database guru, is publishing on his blog, Small Datum.

I’m going to start with his first benchmark post and work my way forward to the newest. And unless I get feedback which suggests this isn’t useful, I will add new posts here as Mark adds new benchmarks there.

In the interest of full disclosure, Mark is the brother of Tokutek VP of engineering, Tim Callaghan. Unfortunately for Tokutek, I know this means some of you may discount what he has to say. I hope you will look past the happy coincidence long enough to evaluate his methodology and findings. If you do I am confident you’ll find his work unbiased and informative.

The meat & potatoes…

With that introduction out of the way, here is how this marketer simply summarizes Mark’s first Small Datum benchmark post. It was published in February of this year and it’s titled Write Amplification: Write-optimized versus Update-in-place.

To understand the business value associated with this benchmark you first have to know a little about Write Amplification (WA): WA is an unfortunate characteristic of database applications that utilize SSD media to improve performance.  It can be measured and expressed as the actual number of bytes written in order to create a single byte of stored data.  There’s a good article on the subject over at Wikipedia.

Notably, WA degrades performance thereby reducing the gains that come from using SSD. Moreover, because SSD failure is tied to the total number of bytes written over time, WA also reduces the life expectancy of your SSD media.

Why should you care about this?  Well, performance problems usually exhibit themselves in the form of dissatisfied users. If your Big Data application is customer-facing that could mean a revenue hit or other undesirable business impacts. Reduced SSD life expectancy obviously leads to increased costs.

In his work, Mark uses Facebook’s LinkBench to compare the WA characteristics of MySQL with the InnoDB versus TokuDB storage engines. He uses a number of tuning techniques to minimize WA in InnoDB and compares that to untuned TokuDB which was purpose-built to, among other things, minimize WA by replacing 40-year-old B-Tree Indexes with Tokutek Fractal Tree Indexes.

Bottom line: Mark’s WA benchmarks clearly show the WA profile (expressed as total gigabytes written) of TokuDB applications are roughly 1/2 (or better than) that of write-optimized InnoDB applications. This suggests TokuDB applications will perform better while extending the life of your SSD hardware.  To wit, the expected business benefits will include better application performance and user satisfaction plus reduced hardware costs.

You can try it for yourself in your own environment by downloading the free community versions of TokuDB (or TokuMX) here and running your own benchmarks using Mark’s methodology as your guide. If you do, I’d love to hear from you.

As always, your thoughts and comments are welcome.

In Thoughts on Small Datum – Part II: This marketer’s simple summary of Mark’s insertion benchmarks comparing MySQL with InnoDB versus TokuDB and stock MongoDB versus TokuMX (our high-performance distribution of MongoDB).  If you want to get a head start, check out his post Insert Benchmark for InnoDB, MongoDB and TokuMX and Flash Storage.

Tags: , , , , , , , , , , .

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>