What’s new in TokuMX 1.4, Part 1: Primary keys

We just released version 1.4.0 of TokuMX, our high-performance distribution of MongoDB. There are a lot of improvements in this version (release notes), the most of any release yet. In this series of blog posts, we describe the most interesting changes and how they’ll affect users.

MongoDB doesn’t have a “primary key” the way SQL databases do. In vanilla MongoDB, all documents are stored in arbitrary order in a heap, and the “_id” index and all secondary indexes point into that heap. In TokuMX, the documents are clustered with the _id index, and all secondary indexes store a copy of the _id for each document so they can look up the full document in the _id index for non-covering queries.

In this way, TokuMX collections sort of have a primary key, but it’s always the _id index. It’s just a clustering index that non-clustering secondary indexes use to reference the full document.

In TokuMX 1.4.0, we are introducing a new feature that makes the primary key user-definable, so it doesn’t always have to be the index on {_id: 1}. By setting the primaryKey field of a collection create command, you can define the primary key to any compound key.

To ensure uniqueness, we require that the end of the key is “_id: 1”, and we automatically create an additional unique, non-clustering index on {_id: 1}. This way, the primary key will be clustering and unique, but we’ll only do the unique checks on the non-clustering _id index, where it will be inexpensive if you allow TokuMX to auto-generate values for _id.

This will define the default sort order (“$natural” order) of the collection, and essentially lets you have a clustering key without always having a second clustering index on the _id, if you don’t need it and want to save on storage costs. Keep in mind, though, that the primary key for each document appears as a reference in every non-clustering secondary index, so if you insert documents with large fields in the primary key, that will make all your secondary indexes bigger. Also, you won’t be able to save any documents where the fields in the primary key are arrays or regexes.

To see it in action, simply run a command like

> db.createCollection(“foo”, {primaryKey: {a: 1, b: 1, _id: 1}})
{ "ok" : 1 }

and you can see it work:

> db.foo.getIndexes()
[
        {
                "key" : {
                        "a" : 1,
                        "b" : 1,
                        "_id" : 1
                },
                "unique" : true,
                "ns" : "test.foo",
                "name" : "primaryKey",
                "clustering" : true
        },
        {
                "key" : {
                        "_id" : 1
                },
                "unique" : true,
                "ns" : "test.foo",
                "name" : "_id_"
        }
]

Later this week, I’ll explain why we added this feature and what else it’s used for in TokuMX 1.4.

Want to check out the newest version of TokuMX?  Download TokuMX 1.4.0 here:

MongoDB Download MongoDB Download

 

Tags: , , , , , , , , .

4 Responses to What’s new in TokuMX 1.4, Part 1: Primary keys

  1. dorian says:

    So the documents are stored sorted by primary-key and it will be very-fast-good for collections with only-primary-keys(no secondary indexes) ?

    Since you use compression, and documents are compressed in blocks, do you store the index separate from the document_blocks, or do you know for each block the first/last document_id and then decompress the block on SELECT queries to see if the documents we want are inside?

    • Leif Walsh says:

      Yes, the documents are sorted by primary key. It will be very fast for range queries on the primary key, because the data will have good physical locality. Secondary indexes don’t affect this, you can still have them if you use a primary key, and they won’t hurt anything, just like they wouldn’t have made queries on the _id key slower if that was your primary key. You can see a workload with a primary key and a secondary key in the benchmark Tim just posted today: http://www.tokutek.com/2014/02/tokumx-vs-mongodb-sharding-balancer-performance/

      Yes, we use compression. Documents are stored clustered in the index (to achieve physical locality). It’s not quite as simple as that, but suffice it to say that yes, we know the bounds of the documents in each leaf block before we decompress it. One description of the data structure is here: http://www.tokutek.com/2013/07/tokumx-fractal-treer-indexes-what-are-they/

      Every block of data we write out for the index (buffers and leaves) is compressed on disk, but uncompressed in memory. So in-memory queries don’t require decompression, and out-of-memory queries require decompression, but this is so fast compared to the I/O required that it usually doesn’t slow anything down. In fact, for some workloads, the reduced I/O bandwidth due to compression actually means that with more compression, the workload gets faster.

      • dorian says:

        cassandra saw the same thing, you increase performance by compressing because you write/read less data,

        but looks like “_id” must be unique and you have +1 index here
        “”" and we automatically create an additional unique, non-clustering index on {_id: 1} “”"?

        • Leif Walsh says:

          Yes. In order to cluster the document with the PK and reference it from other secondary indexes, the PK must be unique. But keys that would be useful as a non-_id primary key would generally have more random inserts, compared to the {_id: 1} index which is almost perfectly sequential.

          In a fractal tree, unique index inserts require a query first, to ensure uniqueness, before the insert. In a B-tree, the cost of the insert itself is just as expensive, so unique indexes aren’t special, but with fractal trees, unique indexes aren’t nearly as fast as non-unique indexes, if those queries require I/O. You can read more about unique indexes here: http://www.tokutek.com/2013/07/why-unique-indexes-are-bad/.

          For sequential inserts into a unique index, however, this isn’t so bad, because everything we need is basically guaranteed to be in cache. So, what we’re doing here is maintaining a non-clustering {_id: 1} index to ensure uniqueness, where inserts will be fast because they’re sequential (as long as you leave the ObjectID auto-generated), and then clustering the documents with the PK so they’re in the order we want them for queries. The fact that the _id fields are unique implies that the PK must be unique (because it ends in _id), so we don’t have to do tree searches for unique checks there, and so the (higher-entropy) inserts there are fast too. We’ll probably never use the {_id: 1} index for lookups, it’s only there to make the unique checking easier.

          Let me know if you need me to clarify that or say it a different way.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>