TokuMX: Fractal Trees with MongoDB
Over several blog posts, Tim has presented performance results on large data sets of TokuMX, our MongoDB product with fractal tree indexes integrated, side by side with MongoDB. Results look good. We’ve shown improved throughput numbers on a sysbench benchmark, faster load times, and high compression.
So what is TokuMX, and how does it achieve this performance?
TokuMX has replaced ALL of the storage code in MongoDB with fractal trees. Every collection, every secondary index, every metadata collection is stored with fractal trees, the same technology that implements the TokuDB storage engine for MySQL. That is, all data is stored and managed with our transactional, ACID and MVCC-compliant, write-optimized storage library.
TokuMX achieves high compression for the same reason TokuDB for MySQL does: fractal trees compress really well by ensuring they compress data in large chunks. TokuMX achieves high insertion rates on index-rich collections for the same reason TokuDB for MySQL performs so well on iiBench, fractal trees are a write-optimized data structure designed to maintain insertion performance on larger than memory workloads. TokuMX does not require constant compaction for the same reason that TokuDB for MySQL does not require users to constantly run “optimize table” to reorganize data, fractal trees don’t fragment. MongoDB and MySQL are very different products with very different user experiences, but the underlying data structure of their storage is the same: the B-Tree. Fractal trees are better.
By completely replacing MongoDB’s storage code with our storage code, we are able to change the concurrency of the system. In MongoDB, updates, inserts, and deletes require database level write locks. Because in TokuMX we own all the storage, and our storage system is transactional and supports document-level locking (or row-level locking for those familiar with MySQL), TokuMX does not require these database level write locks. Instead, we can grab read locks (we need something to prevent concurrent file operations), similar to the MDL locking that MySQL uses. This work is a big reason for the improved sysbench numbers presented earlier.
TokuMX is something we have been working on for quite some time, and we are excited about the benefits it will bring (and I couldn’t even get to multi-statement transactions and clustering indexes, that will be another blog). And we have only scratched the surface. The source code, if anyone is interested, is here.
We would love for people to try it out and give feedback. To try the release candidate for TokuMX:
- Email Tokutek
- Phone +220.127.116.1100