Compression Benchmarking: Size vs. Speed (I want both)
I’m creating a library of benchmarks and test suites that will run as part of a Continuous Integration (CI) process here at Tokutek. My goal is to regularly measure several aspects of our storage engine over time: performance, correctness, memory/CPU/disk utilization, etc. I’ll also be running tests against InnoDB and other databases for comparative analysis. I plan on posting a series of blog entries as my CI framework evolves, for now I have the results of my first benchmark.
Compression is an always-on feature of TokuDB. There are no server/session variables to enable compression or change the compression level (one goal of TokuDB is to have as few tuning parameters as possible). My compression benchmark uses iiBench to measure the insert performance and compression achieved by TokuDB and InnoDB. I tested InnoDB compression with two values of key_block_size (4k and 8k) and with compression disabled.
As you can see in the above graph, compression allows for the database to use significantly less disk space. TokuDB achieved 51% compression, InnoDB achieved 50% for key_block_size=4 and and 47% compression for key_block_size=8. [Note: The random nature of iiBench makes it difficult to compress]
Traditionally there is a “size versus speed” trade-off when compressing data. Data compression utilities have long offered variable levels of aggressiveness, spending more time compressing files usually results in smaller files. The InnoDB benchmarks bear this out, as the compression level increases the insert performance declines. On the other hand, TokuDB achieves the highest level of compression while out-performing InnoDB in all scenarios, even InnoDB without compression. TokuDB is running 33.4x faster than InnoDB configured to achieve similar levels of compression. Note, “Inserts per Second” was measured as the exit velocity of the benchmark run (the average of the last million inserts).
How much compression can be achieved?
To answer this I decided to load some web application performance data (log style data with stored procedure names, database instance names, begin and ending execution timestamps, duration row counts, and parameter values). TokuDB achieved 18x compression, far more than InnoDB. It also loaded the data much faster but that is a blog entry for another day…
- iiBench, insert 25mm rows, 1000 rows per commit
- Intel Core-i7/920 @ 3.6GHz, 12GB DDR3 @ 1600MHz, 2 x SATA II
- Ubuntu 11.04, TokuDB 5.0.4, MySQL 5.1.52, InnoDB plug-in 1.0.13