Lock Diagnostics and Index Usage Statistics in TokuMX v1.2.1

Posted On September 18, 2013 | By Leif Walsh | 0 comments

TokuMX v1.2.1 introduces two simple new features to help you understand the performance characteristics of your database: lock diagnostics and index usage statistics. We’d like to take you through a few examples of what these features are and how to use them.

vintage-lock

Lock Diagnostics

Since we introduced TokuMX, one of the most frequent complaints has been about “lock not granted” errors.  These arise when a long-running operation takes document-level locks, and other clients timeout while waiting to acquire the same locks.

This is a new problem in TokuMX that doesn’t exist in MongoDB, because MongoDB doesn’t have document-level locks.  It has a single lock per database, and if an operation takes a long time, it simply yields its lock periodically to make sure other clients don’t have to wait too long to start working.  This makes MongoDB feel more concurrent, even though only one client can actually be writing to a collection at a time.

In TokuMX, we do allow concurrent writers within the same collection, but they take document-level locks in order to maintain transactional serializability.  We don’t allow a client to yield its locks, because that would break the isolation property, like MongoDB does.  Therefore, a long-running operation that locks large ranges of a collection can cause many other clients to time out if they need to access documents in those ranges.  The default timeout on the server is four seconds, but a driver’s socket timeout may be set to less than this.

The simplest example of such a long-running operation is a “multi: true” update on an entire collection: db.coll.update({}, {$set: {x: “default value”}}, {multi: true}).  This will lock the entire collection until the update completes, which could take a long time if it’s a large collection.

Until now, the only way to diagnose these problems was to look at the workload and guess at what operations might be conflicting.  Now in TokuMX v1.2.1, we’ve introduced the commands showLiveTransactions and showPendingLockRequests, which can show you exactly what’s going on inside, and point out which operations you should scrutinize.

Let’s take an example.  Suppose we have a large collection and we want to add a field with a default value.  We run db.coll.update({}, {$set: {x: “default value”}}, {multi: true}).  This will take a while because there’s a lot of data, but while it’s running, another operation on the same collection will stall:

> db.coll.insert({a:1})
Lock not granted. Try restarting the transaction.

If we run that again, then before it returns we can run showPendingLockRequests:

> db.showPendingLockRequests()
{
  "requests" : [
    {
      "index" : "stress_test0.coll.$_id_",
      "requestingTxnid" : 201536646,
      "blockingTxnid" : 201536637,
      "started" : ISODate("2013-09-17T18:03:18.805Z"),
      "bounds" : [
        {
          "_id" : ObjectId("5238996659aac9d3335787a4")
        },
        {
          "_id" : ObjectId("5238996659aac9d3335787a4")
        }
      ]
    }
  ],
  "ok" : 1
}

We can see that we have an operation (with txnid 201536646, that’s our insert) waiting on another client with txnid 201536637.  If we check db.currentOp() we can find the operation with that rootTxnid, which will show us exactly which operation we’re waiting behind:

> db.currentOp()
{
  "inprog" : [
    {
      "opid" : 613048,
      "active" : true,
      "secs_running" : 237,
      "op" : "update",
      "ns" : "stress_test0.coll",
      "query" : {
      },
      "client" : "127.0.0.1:58511",
      "desc" : "conn14",
      "threadId" : "0x7f1e7fdfa700",
      "connectionId" : 14,
      "rootTransactionId" : 201536637,
      "locks" : {
        "^" : "r",
        "^stress_test0" : "R"
      },
      "waitingForLock" : false,
      "lockStats" : {
        "timeLockedMicros" : {
        },
        "timeAcquiringMicros" : {
          "r" : NumberLong(11),
          "w" : NumberLong(0)
        }
      }
    }
  ]
}

In addition, when an operation fails with “lock not granted,” the server prints some of this information to the log.  It’s not as much information as you get above (that would be a lot to print), but if you have a frequent locking problem, it’s enough to point you in the right direction.

Sometimes it’s useful to just see what locks are currently held in the system.  That’s exactly what showLiveTransactions does.  Here I have one client that started a transaction and updated one document:

> db.showLiveTransactions()
{
  "transactions" : [
    {
      "txnid" : 201546726,
      "rowLocks" : [
        {
          "index" : "stress_test0.coll.$_id_",
          "bounds" : [
            {
              "_id" : ObjectId("5237174bfa0f59d0ccf5d362")
            },
            {
              "_id" : ObjectId("5237174bfa0f59d0ccf5d362")
            }
          ]
        }
      ]
    }
  ],
  "ok" : 1
}

This shows us that an operation with txnid 201546726 has a lock on one point in stress_test0.coll.

Index Usage Statistics

In TokuDB and TokuMX, we always encourage our users to add lots of indexes, because indexes help queries run faster, and our products maintain many indexes easily.  But each index still adds some work to every write operation, and unique indexes are sometimes expensive, so if your queries aren’t using all your indexes, it pays to get rid of the ones that you don’t use.  This is an even bigger problem for vanilla MongoDB, because its indexes are expensive to maintain.

The other feature we added in TokuMX v1.2.1 is that index usage statistics are now reported by the collStats command, a.k.a. db.coll.stats() in the shell.  This shows how much each index has been used since the last server reboot, in terms of how many individual queries, inserts, and deletes have hit a given index, as well as the explain() counters nscanned and nscannedObjects, which you can use to describe the selectivity of an index for typical queries.

As an example, if I see the following for an index in db.coll.stats():

> db.coll.stats().indexDetails[1]
{
  "name" : "a_1", ...,
  "queries" : 467394,
  "nscanned" : 1121607,
  "nscannedObjects" : 1121607,
  "inserts" : 102947,
  "deletes" : 219117
}

This tells me that I’m using this index a lot, and and it’s quite selective, because nscanned is equal to nscannedObjects.  In contrast, let’s see the other index in this collection:

> db.coll.stats().indexDetails[2]
{
  "name" : "a_-1", ...,
  "queries" : 1,
  "nscanned" : 106,
  "nscannedObjects" : 106,
  "inserts" : 102947,
  "deletes" : 219117
}

Here, I have the same number of inserts and deletes (because it’s practically the same index), but I’ve hardly ever queried this index!  In fact, such a low number probably means that I’m not really using it at all.  That’s because the query optimizer in MongoDB and TokuMX works by “racing” different query plans until it decides that one is best.  So you’ll see even the indexes that you aren’t using get used a little bit every time the optimizer decides to test it.  Therefore, you should typically consider indexes that are used very little to actually be used not at all.

We hope these features help you better understand your application and your data.  See the TokuMX User’s Guide for all the details.  Please let us know if there are other things we could track that would help.

Leave a Reply

Your email address will not be published. Required fields are marked *