iiBench Contest – Who Can Insert 1B Rows into MYSQL the Fastest?

At the recent OpenSQL Camp in Charlottesville, VA, Tokutek offered a challenge to the MySQL community – who can insert a billion rows into MySQL the fastest?  We will post the results on our website and the winner gets a $100 Starbucks card, along with valuable bragging rights.

Tokutek’s technical founders (Michael A. Bender, Martin Farach-Colton, and I), in our academic roles (at Stony Brook, Rutgers, and MIT, respectively) have been investigating how to maintain indexes for large databases.  Part of the challenge for this kind of research is to figure out what to measure.

Some other benchmarks, such as TPC-H and SSB, measure bulk load time rather than insertions.  We are interested in the case where you must insert a small number of rows at a time at a high rate, and keep the index up-to-date.  Indexed insertions are interesting in situations with high incoming data rates and a desire to concurrently query on new data without waiting for periodic batch loads.  We wrote, with the help of Tokuteknologist Vincenzo Liberatore, a simple open source benchmark named iiBench, specifically designed to stress indexed insertion performance.  Using iiBench, we tested InnoDB and MyISAM, and found that insertion rates for both storage engines drop off dramatically as the database grows.

This benchmark is a work in progress.  It has problems, and we’re looking for feedback on how to improve it.  The benchmark is essentially to insert a billion rows into a table, maintaining an interesting primary key plus two interesting indexes.

  • You may modify the benchmark code.
  • The database should be fully indexed after every 1000 insertions.  Our intent is that loaded data should be fully indexed as it is loaded, and queries should reflect up-to-the-minute data.
  • The benchmark has the obvious flaw that it includes no queries.  Again, our intent is that data should be queryable.  I would consider a solution that provides fast insertions at the expense of poor query performance to be worse, but I don’t know how to formalize that idea.  So for now we just say “you must insert with indexes”, and point out that a bulk load followed by an alter table isn’t satisfying the rules.

  • The machine we used is a fairly inexpensive server:  A 2-socket quad-core 3.16GHz Xeon with 16GB of main memory and 8 146GB 10,000 RPM SAS drives and a hardware raid 5 controller.
  • We used MySQL 5.1 on Linux.
  • We’re interested in how to get the best performance on InnoDB or MyISAM.

Although our research is on how to improve insertions, this contest isn’t about how much faster I can solve this problem with some other storage engine.  The contest is more like a “peer review” to demonstrate that we’ve gotten as much out of MyISAM and InnoDB as we can.  Admittedly, we may not have found the optimal MySQL parameters, so we are sponsoring a contest to see who can insert 1B rows into MySQL the fastest using iiBench.  We’re hoping to learn just how fast MySQL can do indexed insertions given better tuning or via other innovative techniques.  An overview of the contest with ground rules along with the iiBench source is available at http://www.tokutek.com/contest.php

We want to improve iiBench. Some day it may be good enough to be a useful benchmark.

Take a look and submit an entry by 31 Dec 2008.  If you have any questions, please e-mail us at contest@tokutek.com.

No Tags.

8 Responses to iiBench Contest – Who Can Insert 1B Rows into MYSQL the Fastest?

  1. This is an interesting test that will find problems in other storage engines. So we can make things faster, but InnoDB will never be optimal for this workload. Some feedback:
    * your test script would be much easier to use were it implemented in Perl or Python
    * post the my.cnf settings you used for your test

    I would like to see another test that does both inserts and updates. Some interesting workloads require a mix. Insert performance was in the news here — http://www.dbms2.com/2008/12/02/data-warehouse-load-speeds-in-the-spotlight

  2. Nils says:

    Why did you choose RAID-5 over RAID-10 (RAID0 is probably not an option)?

  3. The RAID5 kills your write performance!

  4. The choice of using RAID5 was kind of arbitrary. One of our beta customers was benchmarking us against MyISAM on a RAID 5, so we did the same thing.

    It would be interesting to try another RAID configuration. If you can do better than we did, it might form the basis for a winning entry!

  5. Nils says:

    Hmm well I don’t have that kind of hardware lying around… I got a spare box which has 8 SATA 10k RPM disks with a 3ware controller, maybe I’ll try on that one.

  6. That kind of hardware variant seems fair to me.

  7. Another result metric that can be used to judge how well a storage supports this workload is (time to insert first 100M rows / time to insert last 100M rows). I will have numbers soon. InnoDB, with a few patches and better my.cnf parameters and without RAID5, does better although it still degrades a lot over time. I will guess that Tokutek does great on this test. Where are your results for that?

  8. To do well by that metric, all I have to is slow down my first 100M insertions :-)

    Yes, our research is all about fast insertions into indexes. But I don’t want this to be a contest of MyISAM vs. some unspecified Tokuteknology. That doesn’t seem fair.

    I’m interested in two things: (1) how fast *can* MyISAM or InnoDB go? (2) What can we do to improve iiBench to both reflect real-world insertion problems and to head off optimizations such as FakeAmelia’s (which is a great hack for this contest, but doesn’t necessarily achieve our intent that useful indexes should be maintained).

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>