We often hear from customers and MySQL experts that fragmentation causes problems such as wasting disk space, increasing backup times, and degrading performance. Typical remedies include periodic “optimize table” or dump and re-load (for example, see Project Golden Gate). Unfortunately, these techniques impact database availability and/or require additional administrative cost and complexity. Tokutek’s Fractal Tree algorithms do not not cause fragmentation, and we’re looking for ways to measure the effects of fragmentation to quantify TokuDB’s benefits.
I ran some tests using the iiBench benchmark as an experiment to try and quantify the impact of fragmentation, and observed some interesting results.
Initial Load – 50M Rows
I created an iiBench table with 50M rows, and recorded how long it took to complete along with the amount of disk space used by the data and indexes; log files are *not* included in the reported disk use.
$ iibench.py --setup --insert_only --engine=[innodb | tokudb] --max_rows=50000000 --table_name=t1
|Insert 50M Rows||3,349s||28,821s||3.3GiB||11GiB|
select count(*) was run twice to see how fast it ran with the cache warmed up (the query cache was off).
When inserting into TokuDB, some data may remain in internal data structures, so optimize table was run to push all of the data to disk for an accurate measure of disk use. Initially, I thought running optimize table would not be necessary for InnoDB, but I ran it to ensure a consistent procedure on both engines. Surprisingly, InnoDB used much more disk space after running optimize table. According to this post, InnoDB’s insert buffer is stored in the tablespace, so data in the insert buffer should have been included in the measured disk use before optimizing the table. I’m not sure what caused such a large increase in disk use.
Deleting 10M Rows
mysql> delete from t1 where transactionid
|Delete 10M rows||1,050s||169,238s||4.7GiB||20GiB|
Deleting 10M rows took about 17 minutes on TokuDB, and over 47 hours on InnoDB. After deleting the rows, select count(*) on InnoDB ran slow, even after attempting to warm up the cache. I ran it three times, and it was consistently slow. I ran optimize table, and select count(*) ran much faster, suggesting that the problem may have been caused by fragmentation. TokuDB does not fragment, so select count(*) ran fast on TokuDB after deleting rows, with no need to optimize.
Deleting 10M rows from a 50M row table caused the time to run select count(*) on InnoDB to increase by a factor of about 55. Running optimize table solved the problem, but it took almost 8 hours to complete. Dumping and reloading is likely to be faster than optimize or alter, but it still takes time and effort.
After deleting 10M rows on TokuDB, the time to run select count(*) decreased from about 19s to about 15s (proportional to the decrease from 50M to 40M rows in the table), without a need for optimizing or dumping and reloading.
Posts from Mark Callaghan and Bradley C. Kuszmaul show that iiBench’s linear distribution of data does not provide a good model of some real world data sets, and a Zipfian distribution is probably a better model. It would be interesting to re-run the experiment with an updated version of iiBench using a Zipfian distribution. Running similar delete experiments on large real world data sets would be interesting as well.
I ran the tests on a machine with a modest amount of memory and slow cores by today’s standards.
- CentOS 5.1
- Dell PowerEdge 2950
- 2 Socket, Quad Core Intel Xeon 1.6GHz
- 4GB Main Memory (2GB InnoDB Buffer Pool, 2GB TokuDB Cache)
- 5 disk SW RAID5 1TB SATA
- ext3 Filesystem
I used default parameters for TokuDB and the InnoDB parameters are shown in the my.cnf file below. By default, TokuDB uses 1/2 of physical memory (2GB) for it’s cache size, so the InnoDB buffer pool was set to 2GB for a fair comparison. It may be possible to achieve better InnoDB results through tuning, but the goal of this exercise was to search for ways to quantify the impacts of fragmentation.
[mysqld] innodb_log_buffer_size=4M innodb_thread_concurrency=8 innodb_log_files_in_group=3 innodb_log_file_size=1300M innodb_flush_log_at_trx_commit=2 innodb_doublewrite=0 innodb_buffer_pool_size=2G innodb_max_dirty_pages_pct=90