Introducing Partitioned Collections for MongoDB Applications

Posted On May 29, 2014 | By Zardosht Kasheff | 2 comments

TokuMX 1.5 is around the corner. The big feature will be something we discussed briefly when talking about replication changes in 1.4: partitioned collections.

Before introducing the feature, I wanted to mention the following. Although TokuMX 1.5 is not available as of this writing, we would love to hear feedback on partitioned collections, which we think are wonderful for time-series data, as I describe below. If you are interested in trying out the feature, email support@tokutek.com for a pre-release version of TokuMX 1.5.

What is a partitioned collection?

A partitioned collection is analogous to a partitioned table in relational databases. Oracle, MySQL, SQL Server, and Postgres all support partitioned tables. We are happy to bring this functionality to TokuMX. So, if the remainder of this blog is unclear, and you have friends in the office who are familiar with relational databases, you may want to ask them for more information :).

Nevertheless, a partitioned collection is a collection that underneath the covers is broken into (or partitioned into) several individual collections, based on ranges of a “partition key”. From the application developer’s point of view, the collection is just another collection. Queries, inserts, updates, and deletes just work with no syntactical changes. Secondary indexes and replication work as well. But underneath the covers, the data will be broken into several collections, with each collection responsible for all data for a range of the partition key.

If you are running TokuMX 1.4, a simple example is the oplog, which is a partitioned collection. Any normal query works just fine on the oplog. However, if you look in your data directory, you will see several .tokumx files named “local_oplog_rs_p…”. These files are the individual partitions that break up the data. Each partition stores a range of _id fields in the oplog.

Why should I bother using a partitioned collection?

This will be its own post with longer examples, but here is a summary. Partitioned collections have two big advantages:

  • Large chunks of data can be deleted very efficiently by dropping partitions. The cost is that of performing an “rm” on some files in the filesystem. This is really fast and efficient.
  • Queries that include the partition key may be isolated to individual partitions, and therefore run faster. This is similar to “query isolation” for shard keys.

So, one scenario you may want a partitioned collection for is where the oldest data gets dropped periodically, and many queries benefit from a time based key. That will be a good fit.

In short: time series data. If you have a time-series application where you want to keep a rolling period of data (e.g. the last 6 months worth), then using a partitioned collection will be great, and is preferable to using a TTL index or a capped collection. In a future blog post, I will expand on this.

How do I use a partitioned collection in TokuMX 1.5?

Basically, just like a normal collection, except with some commands added to create a partitioned collection, add partitions, and drop partitions. Below, I explain the shell commands added for this functionality. Our documentation contains the full commands so that they may be called by any driver’s runCommand method.

Ok, so how do I create a partitioned collection?

The first thing to consider is what your partition key should be. That is, what key do you want to use ranges of to partition your data? This key has similarities with a shard key. It should be a key that can be used to isolate partitions, the way a shard key is used to isolate shards (as explained here). Also, it should be a key that contains a range of data you would like to delete all at once.

With time series data, that key will likely be a timestamp.

In TokuMX, the partition key is always the primary key. To create a partitioned collection, “foo”, with a timestamp field, “ts”, used for the partitioning run the following:


> db.createCollection("foo", { partitioned : 1 , primaryKey : { ts : 1 , _id : 1 } })
{ "ok" : 1 }

Note that in TokuMX, the primary key must have the _id field appended to it to ensure uniqueness.

As a side note, we do not support hash based partitioning, only range based partitioning.

Adding partitions?

In TokuMX, partitions can only be appended to the end. Individual partitions cannot be split. So, say we have a collection that partitions on the _id field, where all _id’s happen to be integers. Suppose we have three partitions with the following ranges:

  • _id <= 0
  • 0 < _id <= 1000
  • _id > 1000

With this collection we cannot create a partition with the range 500 < _id <= 1000, because that would split the second partition.

All we can do is add a new partition to the end, and “cap” the current last partition with a new maximum value. This new maximum value must be greater than or equal to the primary key (or in this case, _id) of the last partition’s last document. So, if the last partition’s last document has an _id of 2500, we can only partitions that create a range whose maximum is at least 2500.

There are two ways to add a partition. The first method peeks at the last document in the current last partition, caps the partition with the primary key of that last document, and creates a new partition. To do so, one does:


> db.foo.addPartition()
{ "ok" : 1 }

In the above example, the partitioned collection would now have partitions with the following ranges:

  • _id <= 0
  • 0 < _id <= 1000
  • 1000 < _id <= 2500
  • _id > 2500

Alternatively, we can specify what the new maximum of the existing last partition may be, provided it is greater than the last document in last partition (which in this example is 2500). To do so, we simply pass in the new maximum as a parameter:


> db.foo.addPartition({ _id : 3000 });
{ "ok" : 1 }

This would make the collection have partitions with the following ranges:

  • _id <= 0
  • 0 < _id <= 1000
  • 1000 < _id <= 3000
  • _id > 3000

Dropping partitions?

Dropping partitions is simple. First, see what the partitions are with the following shell command:


> db.foo.getPartitionInfo()
{
       "numPartitions" : NumberLong(4),
       "partitions" : [
               {
                       "_id" : NumberLong(0),
                       "max" : {
                               "_id" : 0
                       },
                       "createTime" : ISODate("2014-05-29T01:50:15.839Z")
               },
               {
                       "_id" : NumberLong(1),
                       "max" : {
                               "_id" : 1000
                       },
                       "createTime" : ISODate("2014-05-29T01:50:27.049Z")
               },
               {
                       "_id" : NumberLong(2),
                       "max" : {
                               "_id" : 2500
                       },
                       "createTime" : ISODate("2014-05-29T01:50:30.549Z")
               },
               {
                       "_id" : NumberLong(3),
                       "max" : {
                               "_id" : { "$maxKey" : 1 }
                       },
                       "createTime" : ISODate("2014-05-29T01:50:35.903Z")
               }
       ],
       "ok" : 1
}

This lists each partition, what the maximum value that each partition may hold (thus defining the range of the partition), and the id of the partition (in the _id field). So, in the example we used for adding partitions, we have four partitions with _ids 0 through 3.

To drop a partition, we run the following command and pass the _id of the partition we want to drop. To drop partition 0, we run:


> db.foo.dropPartition(0)
{ "ok" : 1 }

Looking at the list of partitions after this operation, we see the partition is dropped:


> db.foo.getPartitionInfo()
{
       "numPartitions" : NumberLong(3),
       "partitions" : [
               {
                       "_id" : NumberLong(1),
                       "max" : {
                               "_id" : 1000
                       },
                       "createTime" : ISODate("2014-05-29T01:50:27.049Z")
               },
               {
                       "_id" : NumberLong(2),
                       "max" : {
                               "_id" : 2500
                       },
                       "createTime" : ISODate("2014-05-29T01:50:30.549Z")
               },
               {
                       "_id" : NumberLong(3),
                       "max" : {
                               "_id" : { "$maxKey" : 1 }
                       },
                       "createTime" : ISODate("2014-05-29T01:50:35.903Z")
               }
       ],
       "ok" : 1
}

This covers how to use partitioned collections. We hope users in the MongoDB ecosystem find this feature as useful as relational database users do.

In the comments section below, feel free to leave questions and/or feedback.

2 thoughts

  1. AFAIK MongoDB exposes a similar functionality for partitioning by tagging intervals to go on particular shards:
    http://blog.mongodb.org/post/85721044164/tiered-storage-models-in-mongodb-optimizing-latency
    Are there important differences between the two features?

Leave a Reply

Your email address will not be published. Required fields are marked *