Understanding the Performance Characteristics of Partitioned Collections

In TokuMX 1.5 that is right around the corner, the big feature will be partitioned collections. This feature is similar to partitioned tables in Oracle, MySQL, SQL Server, and Postgres. A question many have is “why should I use partitioned tables?” In short, it’s complicated. The answer depends on your workload, your schema, and your database of choice. For example, this Oracle related post states “Anyone with un-partitioned databases over 500 gigabytes is courting disaster.” That’s not true for TokuDB or TokuMX. Nevertheless, partitioned tables are valuable; it’s why we are adding them to TokuMX 1.5.

In this post, I want to take a step back and explain the performance characteristics of partitioned tables on a technical level. I won’t be focusing on use cases or scenarios. Instead, I will compare the performance and behavior of basic operations in a partitioned table with those of a normal table.

Note that collections in MongoDB and TokuMX are analogous to tables in relational databases, and documents are analogous to rows. For the remainder of this post, I will refer to collections and documents, but all information equally applies to relational databases as well.

First, a quick recap. What is a partitioned collection? Quoting myself from here:

“A partitioned collection is a collection that underneath the covers is broken into (or partitioned into) several individual collections, based on ranges of a “partition key”. From the application developer’s point of view, the collection is just another collection.”

Right off the bat, hopefully, the first performance characteristic is obvious: instantaneous deletion of large amounts of data by dropping partitions. If you have a collection storing time-series data, and want to keep a rolling period of six months of data, then with a normal collection, as you insert new data, you need to delete data that is now greater than 6 months old. The data deletion may be just as expensive as the data insertion, as both need to find documents to delete and maintain secondary indexes. This can cut your write throughput in half. With partitioned collections, the cost of dropping this data is practically free, as all you do is essentially “rm” files associated with the oldest partition. This is how TokuMX maintains its oplog, by partitioning and dropping old data on a daily basis.

As for other operations, at a high level, databases only do the following:

  • Reads
  • Writes

Queries, be they “SELECT …” in relational databases, find() in MongoDB/TokuMX, or aggregation in MongoDB/TokuMX are reads. Inserts are basically just writes. Updates and deletes are a combination of reads and writes. Reads (really queries) are done to find the documents to be updated or deleted, and subsequent writes are done to perform the update/delete. So, a large part of understanding the performance characteristics of partitioned collections is to understand how they perform reads and writes.

Writes

From my reading, when people discuss writing to partitioned collections, they generally assume that nearly all writes go to the last partition (like it would with time-series data). At least, that is a big motivation. The reason is as follows. B-tree based databases (like MySQL, Oracle, Postgres, and SQL Server) struggle with writes when the collection’s indexes do not fit in memory. That is why MySQL and MongoDB struggle so badly with iibench.

So, if one partitions their collections at enough of a granularity such that the last partition always fits in memory, then write performance for B-tree based databases may improve significantly. Note that if writes are not always targeted to a single partition, and instead may target any partition, then this reasoning does not hold.

On the other hand, TokuDB and TokuMX, which use Fractal Tree indexes, do not struggle with writes when the collection’s indexes do not fit in memory. That is why they perform so well with iibench. So, with TokuDB and TokuMX, one doesn’t need to partition at such a granularity such that the last partition always fits in memory. If you feel the need to partition on a daily basis with other databases, perhaps you can reduce the granularity to weekly, monthly, or not at all. This is something to consider.

Reads

Reads are funny, in that the story is both simple and complicated. The story is simple as follows: a read is a read, regardless of data structure or data layout. If one has 1TB of data, and some particular data is not in memory but instead residing on disk, then I/O is required to read that data. Whether the data belongs to a partitioned collection, normal collection, B-Tree, Fractal tree, whatever, it does not matter. An I/O will be performed. This is one reason why read-heavy workloads perform similarly amongst MongoDB, MySQL, TokuDB, and TokuMX. This also debunks the argument that queries on a partitioned collection are “more likely to stay in memory”. That’s simply not true. If a collection’s usage pattern has query data in memory for a partitioned collection, that data will also be in memory for a non-partitioned collection.

The key to understanding how reads (really queries) perform in a partitioned collection vs a non-partitioned collection is understanding the algorithm, or query plan, used to perform that read. And that is where the story gets more complicated. The performance of queries on partitioned vs normal collections comes down to how efficiently they can locate the documents needed to answer the query.

If the query uses the same index that we are using to partition the data, then query performance between a partitioned collection and non-partitioned collection is identical, because their query plans are essentially identical. This is true regardless of query size, data size, and partition granularity. Both query plans will process the same range of documents, performing essentially the same number of reads. With a partitioned collection, that range may span multiple partitions, but the amount of data processed is the same.

If the query uses another index, things get interesting. To understand how that performs, we must understand how a partitioned collection performs such a query. The best way is to do so with examples. For all of the examples, suppose we have a collection “foo” that:

  • is partitioned on a timestamp, meaning it has a primary key of { ts : 1, _id : 1 }
  • has a secondary index on “a”

So, this is the output of db.foo.getIndexes():


> db.foo.getIndexes()
[
       {
               "key" : {
                       "ts" : 1,
                       "_id" : 1
               },
               "unique" : true,
               "ns" : "test.foo",
               "name" : "primaryKey",
               "clustering" : true
       },
       {
               "key" : {
                       "_id" : 1
               },
               "unique" : true,
               "ns" : "test.foo",
               "name" : "_id_"
       },
       {
               "key" : {
                       "a" : 1
               },
               "ns" : "test.foo",
               "name" : "a_1"
       }
]

Example 1:


db.foo.find({a:100})

Let’s consider these two cases:

  • The query finds a single document (or very few).
  • The query finds many documents.

In the case where the query finds few documents, partitioned collections may perform badly. Here is why. With a normal collection, a single lookup is done in the secondary index and then the primary key to find the document, and we are done. With a partitioned collection a lookup must be done in every partition, because any partition may have this document that we are searching for. So, the number of lookups is equivalent to the number of partitions we have. If we have a large number of partitions (like 100 or 1000), this can be very bad.

In the case where the query finds many documents, then both partitioned collections and normal collections should perform comparably. Partitioned collections still perform lookups in each partition, but because each partition is returning results, the lookups are beneficial. These are lookups that a normal collection would be doing anyway.

So, with this example, depending on how much data is being returned, partitioned collections can perform anywhere from “just as well” to “terribly worst”.

So, the moral here is to be careful of queries performed that must query all partitions.

Example 2:


db.foo.find({
    $and : [
          { a : 100 },
          { ts : {
              $gte: ISODate("2014-06-01T00:00:00.000Z"),
              $lt: ISODate("2014-06-02T00:00:00.000Z")
          } }
    ]
})

In this example, we still want to use the secondary index of “a”, but note that we now have an additional filter on “ts”, which is the first field of our partitioning key (ie, the primary key).

With a normal collection, if the secondary index of “a” is used, all documents where a is 100 are processed. This includes documents where ts does not fall in the specified date range. So, the normal collection may process many more documents than necessary. The partitioned collection, on the other hand, will notice that a clause exists that allows the query to target a subset of the partitions, namely those that overlap with the specified date range. This reduces the number of documents being processed, and speeds up the query.

With this specific example, the partitioned collection may perform much better. The problem, however, is that the secondary index of “a” is not optimal. Regardless of whether the collection is partitioned or not, this query benefits from a compound index of { a : 1, ts : 1}. So yes, a query on a partitioned collection may take advantage of partition filtering to improve query speed, but it’s very possible (if not likely) that a proper compound index that includes the partition key will perform just as well.

With these examples, I think we’ve covered the basic scenarios on how queries perform on normal collections compared to partitioned collections. The properties of queries on partitioned collections vs normal collections can be summarized as follows:

  • If queries do not include the partition key (e.g. timestamp in the example above), then queries may perform MUCH worse. At best, they will perform comparably.
  • If queries do include the partition key, they may perform better, but that is likely due to sub-optimal indexing. If the partition key is appended to the secondary index, then queries will likely perform better. Corner cases exist that I will not dive into.
  • With proper indexing, partition granularity (e.g. partitioning monthly vs. daily) should not really matter. Corner cases exist that I will not dive into.

This covers the performance characteristics of a partitioned collection. In my next post, I will reference these characteristics to go over best practices with TokuDB and TokuMX.

Tags: , , , , , .

One Response to Understanding the Performance Characteristics of Partitioned Collections

  1. Pingback: Articles for 2014-Jun-19 | Readings for a day

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>