Introducing TokuMX Transactions for MongoDB Applications

Since our initial release last summer, TokuMX has supported fully ACID and MVCC multi-statement transactions. I’d like to take this post to explain exactly what we’ve done and what features are now available to the user.

But before beginning, an important note: we have implemented this for non-sharded clusters only. We do not support distributed transactions across different shards.

At a high level, what have we done?

We have taken MongoDB’s basic transactional behavior, and extended it. MongoDB is transactional with respect to one, and only one, document. MongoDB guarantees single document atomicity. Journaling provides durability for that document. The database level reader/writer lock provides consistency and isolation.

With TokuMX, on non-sharded clusters, we have extended these ACID properties to multiple documents and statements, in a concurrent manner. Transactions take document level locks instead of database level locks, leading to more concurrency for both in memory and out of memory read/write workloads.

Here are the behavioral differences:

  • For each statement that tries to modify a TokuMX collection, either the entire statement is applied, or none of the statement is applied. A statement is never partially applied.
  • Commands “beginTransaction”, “commitTransaction”, and “rollbackTransaction” have been added to allow users to perform multi-statement transactions.
  • TokuMX queries use multi-version concurrency control (MVCC). That is, queries operate on a snapshot of the system that does not change for the duration of the query. Concurrent inserts, updates, and deletes do not affect query results (note this does not include file operations like removing a collection).

Let’s tackle each of these one by one and go a little more in depth.

Each Statement is a Transaction

With MongoDB, if a statement fails (e.g. a batched insert), then that statement may be partially applied to the database. What part of the statement succeeded and depends on the scenario.

With TokuMX, either the entire statement succeeds or fails. There is no in-between.

Let’s look at an example. Take the following commands run in a shell:

> db.createCollection("foo")
{ "ok" : 1 }
> db.foo.insert([ {_id : 1 }, { _id : 2 } , { _id : 1 } , { _id : 3 } ])

With both TokuMX and MongoDB, the insertion statement will fail, because the statement is inserting a value of 1 for the unique _id field in two separate documents.

Now let’s look at the state of the collection after both MongoDB and TokuMX try and fail to execute the insertion statement. With MongoDB, querying the collection shows the insert statement is partially applied. The first two documents were inserted, but when the third hit a duplicate key error, the first two remained in the collection:

> db.foo.find()
{ "_id" : 1 }
{ "_id" : 2 }

TokuMX, on the other hand, keeps the collection empty, acting as though the original statement never executed:

> db.foo.find()

Hence, each statement is transactional (of course, this example only shows atomicity, but trust me on the other properties :) ).

Commands for multi-statement transactions

The following commands and shell wrappers have been added:

  • beginTransaction
  • commitTransaction
  • rollbackTransaction

Hopefully, what each command does is self evident. The beginTransaction command supports the following isolation levels, passed in the form { isolation : … }:

  • mvcc (the default)
  • serializable
  • readUncommitted

So, if one wants to create a serializable transaction, one would run:

db.runCommand( { beginTransaction : 1 , isolation : “serializable” } )

Here is an example of a multi-statement transaction being started and rolled back:

> db.foo.insert({ a : "inserted before begin" })
> db.foo.find()
{ "_id" : ObjectId("5271a25ef66f424a03b2158b"), "a" : "inserted before begin" }
> db.runCommand("beginTransaction")
{ "status" : "transaction began", "ok" : 1 }
> db.foo.insert({ a : "inserted during transaction" })
> db.foo.find()
{ "_id" : ObjectId("5271a25ef66f424a03b2158b"), "a" : "inserted before begin" }
{ "_id" : ObjectId("5271a280f66f424a03b2158c"), "a" : "inserted during transaction" }
> db.runCommand("rollbackTransaction")
{ "status" : "transaction rolled back", "ok" : 1 }
> db.foo.find()
{ "_id" : ObjectId("5271a25ef66f424a03b2158b"), "a" : "inserted before begin" }

Note that each multi-statement transaction MUST occur over a single connection. The connection stores the multi-statement transaction’s information. So, be careful with drivers that use connection pools. If each statement goes through a connection pool to get a connection, then a multi-statement transaction will not work.

MVCC queries

The best way to demonstrate this is with another example. Suppose we have a collection with only one element:

> db.foo.find()
{ "_id" : ObjectId("52702a22f66f424a03b2158a"), "a" : "inserted before snapshot" }

Then, we create a multi-statement mvcc transaction:

> db.runCommand("beginTransaction")
{ "status" : "transaction began", "ok" : 1 }

The transaction’s MVCC snapshot has one, and only one document in it, so all queries should show just that one document.

Suppose, in another connection, we insert a document to the collection, and query the collection to show that it is there:

> db.foo.insert({ a : "inserted while transaction is live" })
> db.foo.find()
{ "_id" : ObjectId("52702a22f66f424a03b2158a"), "a" : "inserted before snapshot" }
{ "_id" : ObjectId("52702a6bafef85c2510956e8"), "a" : "inserted while transaction is live" }

If we go back to the first connection that has started a multi-statement transaction and perform a query, we see only the one document that was part of the original snapshot:

> db.foo.find()
{ "_id" : ObjectId("52702a22f66f424a03b2158a"), "a" : "inserted before snapshot" }

I use a multi-statement transaction here to easily show the effect of concurrent queries and inserts (the same can be said for updates and deletes). But these properties hold for single statements as well. Any statement executing a find will perform the query on a snapshot of the data. Concurrent inserts, updates, and deletes do not impact the query’s result.

What about durability?

One of the concerns users have with using durable transactions, be it with TokuMX or MongoDB, is the requirement of flushing a recovery log to disk to ensure the transaction will survive a crash. For MongoDB, that “recovery log” is the journal. If each transaction requires a flush of log information to disk, or an fsync, and disks may be limited to several hundred fsyncs per second, then throughput is limited by the number of fsyncs. While a form of group commit helps each of these products on multi-threaded workloads, the bottleneck caused by fsyncs may be an issue for some applications.

To give users a way around this limitation, TokuMX and MongoDB use a variable that has transactions not flush the recovery log to disk on commit, and instead have a background thread periodically flush the recovery log. That variable is “logFlushPeriod”. The intent is to give users an option to have non-durable transactions and instead periodically flush recovery log data to disk in the background. The end effect is that on a crash, the database loses at most a period’s worth of data.

By default, MongoDB and TokuMX flush the recovery log to disk once every 100 ms, so transactions by default are not durable. On some systems, the default for MongoDB is 30 ms. With TokuMX, we decided to emulate MongoDB’s default behavior.

To make a MongoDB update/insert/delete be durable, you must subsequently call the “getLastError” command with the option { j : true }. The same works for TokuMX. Additionally, with TokuMX, to make all transactions durable by default, one can simply set the value of logFlushPeriod to 0. With this setting, one does not need to call getLastError to ensure durability.

In conclusion, the best way to think about transactions in TokuMX is we’ve taken MongoDB’s single document transactional behavior on single servers, and extended it to multiple documents and statements. In my next post, I’ll dig into some benefits this extended behavior provides, and in the post after that, I’ll explain our motivations for implementing this feature.

Tags: , , .

6 Responses to Introducing TokuMX Transactions for MongoDB Applications

  1. Gaston says:

    Hi! TokuMX looks like a great improvement over standard MongoDB. I have a question , is it possible to do multi-collections transactions using TokuMX?
    Thanks
    Gaston

  2. Kevin says:

    Hi! I’m looking for the solution of transaction on mongodb. And TokuMX seems to solve that. But someone says that the ACID stuff of TokuMX only working to one server (NOT TO shardings). Is that true? And any solution ?

    • Zardosht Kasheff says:

      The multi-document transactions work only on non-sharded clusters. That is accurate. If your operations happen to hit a single shard in a sharded setup (like a batched insert), that will be transactional. Otherwise, no guarantees. We do not synchronize transactions across shards.

  3. Hubert says:

    Yes! Please add sharded transactions! that would be such a great feature!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>