Four Benefits of TokuMX Transactions for MongoDB Applications

Posted On November 7, 2013 | By Zardosht Kasheff | 0 comments

From the application’s perspective, TokuMX behaves very similarly, if not identically, to MongoDB in many ways. But in one subtle yet important way, on non-sharded clusters, TokuMX is different. With MongoDB, operations on each single document are transactional. With TokuMX, each statement is transactional. Although I explain this in my last post, let me reiterate what this means:

  • For each statement that modifies a TokuMX collection, either the entire statement is applied, or none of the statement is applied. A statement is never partially applied. That means if a statement performs 1000 inserts, then no matter what, either all 1000 get applied or none do.

  • TokuMX queries use multi-version concurrency control (MVCC). That is, queries operate on a snapshot of the system that does not change for the duration of the query. Concurrent inserts, updates, and deletes do not affect query results (note this does not include file operations like removing a collection).

We implemented this behavioral difference for several reasons, but one big reason we did so was for the benefits we think it provides MongoDB applications. Below, we list four.

Benefit 1: Cursors represent a true snapshot of the system

In MongoDB, a cursor for a query returns data in batches. If all of the data does not fit in a single batch, then the driver must call getMore to receive more results. If a write interleaves between these calls, then the result of getMore may be affected. One possibility is that if an update moves one document from a location the cursor has read to a location that the cursor will soon read via getMore, then that document will show up twice in the query.

With TokuMX, this is not a concern. Cursors are associated with transactions that have MVCC. As a result, cursors will have a snapshot of the system that is unaffected by concurrent inserts, deletes, or updates. So, one does not need to be concerned with interleaving writes affecting query results.

Benefit 2: Simpler to batch inserts together for performance

Batching inserts together is a common method for improving insertion performance over sending insertions one at a time. By batching the insertions together, some overhead work is amortized. For example, only one round trip is taken over the network instead of one per document. Some locks are grabbed only once as opposed to once per document.

While TokuMX and MongoDB can both batch inserts together, and both see performance improvements, what makes TokuMX simpler is the error handling. With MongoDB, if you send 1000 inserts in a single batch, and the command fails for some reason, then the user does not know if some subset of the 1000 insertions were applied before the command failed. Did 100 of the insertions make it? Did 500 make it? What is the state of the system? With TokuMX, because each statement is its own transaction, the user knows that either the entire statement was applied, or none of it was applied. The user does not need to check if the statement was partially applied.

For some applications, this makes batching insertions together more appetizing.

Benefit 3: Simpler for applications to update multiple documents with a single statement.

This benefit is very similar to the benefit above. Just as batching multiple insertions together may be problematic with MongoDB, updating multiple documents with a single statement is similarly problematic. If the update fails, the application is responsible for determining what subset of the update statement succeeded, and what subset failed. For this reason, some users don’t run with { multi : true } on their update statements.

With TokuMX, because each statement is a transaction, this is not necessary.

Benefit 4: No need to combine documents together for the purpose of atomicity.

A few months ago, I was talking to a user who told me how he got around the fact that he could not update multiple documents in an atomic manner. He combined the documents into a single document, and took advantage of MongoDB’s single document atomicity on his merged documents to implement his application. With TokuMX, this is no longer necessary.

 

In conclusion, we believe TokuMX transactions help simplify application development. In my next post, I explain our other (selfish) motivation for implementing this feature.

Leave a Reply

Your email address will not be published. Required fields are marked *