Partitioning, Free Lunches, & Indexing, Part 2

Posted On January 28, 2011 | By Martin Farach-Colton | 8 comments

Review

In part one, I presented a very brief and particular view of partitioning. I covered what partitioning is, with hardly a mention of why one would use partitioning. In this post, I’ll talk about a few use cases often cited as justification for using partitions.

Lots of disks → Lots of partitioning of tables

One use case for justifying partitions is that each partition can be placed on a separate disk to avoid spindle contention. I have to say that on this one, I agree with Kevin Burton, who makes the point that if you want to distribute I/O load across several disks, you can use a RAID configuration on the disks. In this case, he says that partitioning is not worth the trouble. [NB. He makes the point that this is a problem with MySQL's implementation of partitions and its interaction with MySQL Cluster. He proposes some fixes. If those fixes ever come to pass, we'll have to see what the new performance characteristics look like.]

I spent some time trying to find comparisons between partitioning and RAIDing for load balancing in MySQL. If I missed something, I’d love to know about it, but the results aren’t jumping out.

In summary, if you’re going to claim that a partitioning scheme helps load balance across disks, it makes sense to compare this with a well-designed RAID system. Partitioning is high maintenance. RAID is low maintenance by comparison. I may be lazy, but I’d rather the system did the load balancing and not me.

Avoiding Table Scans

The most commonly claimed performance improvement for partitioning, from what I can tell, is their use in avoiding full table scans during queries. Even though the details vary from case to case, I’ve been able to distill one scenario that illustrates what’s going on most of the time:

  1. A query on data causes a table scan.
  2. Adding a secondary index doesn’t help, because the secondary index does not cover the query. The non-covering index would induce a large number of point queries into the primary table. This is slow enough that the query optimizer chooses a primary table scan instead. (Why not build the secondary index as covering? Probably because secondary indexes can be expensive to build in InnoDB. Perhaps the queries are date+user_id, then date+ip, then date+…, so you’d need a lot of secondary indexes; or even more directly, you know what secondary indexes you need, but InnoDB can’t keep up with the insertion load when you turn them all on. The first problem is solved by using TokuDB clustering indexes. The second is also solved by TokuDB, because it’s so fast at indexing.)
  3. The table is partitioned by date. Now, when a query induces a table scan, one or a few partitions are scanned, instead of the whole table.

The table didn’t get any smaller. We were just able to restrict a table scan to a smaller portion of the table. For me, this helps place partitions into a context of well-known DB techniques. Partitions are often a replacement for covering indexes!

After all, that’s what covering indexes are, right? They are arrangements of data that allow you to scan part of a table to answer a query, without resorting to point queries into a primary table.

If partitions are indexes, are they good ones?

Once again, I looked for comparisons of proposed partitioning schemes with indexing-based choices. In particular, I looked for comparisons of partitioning with covering schemes. No luck. And as before, if I missed something I’d love to hear about it.

If partitions are designed to avoid table scans, then they are a high-effort way to do so. I’m not suggesting that they should never be used for this, but unless the low maintenance option of covering and clustering indexing is considered, partitioning just feels like a lot of work, potentially for no benefit.

Conclusion

The common reasons for using partitioning have low-maintenance, high-performance alternatives. In the case of load balancing across disks, I’d rather use RAID than partition by hand. In the case of avoiding table scans, I’d rather design indexes that avoid table scans than partition by hand. If I can’t get the insertion performance I need once I’ve designed a sensible set of indexes for my query load, I’d rather use a storage engine that does fast indexing (like TokuDB) so I can maintain the indexes I need, rather than the ones I’m forced to by the storage engine.

And in any case, it’s important to compare partitioning to the most effective alternative solutions for the performance problem encountered.

8 thoughts

  1. Hi,
    The main reason for using partitions is handling large chunks of data, especially when dealing with statistics calculations that require selecting a range of rows.
    To really appreciate partitions, you should think of tables so large that you can’t possibly use indexes, for lack of enough RAM in your server. Indexes without RAM equates to horrible performance.
    Using partitions for such cases results in a dramatic speed improvement. For cases when you need to increase the performance of range queries while keeping the sanity of OLTP, I usually recommend using InnoDB without partitions in the master , and ARCHIVE with partitions in a slave.
    Cheers

    Giuseppe

  2. [...] This post was mentioned on Twitter by wamdeeptech, SQL Manager. SQL Manager said: Planet MySQL. Partitioning, Free Lunches, & Indexing, Part 2: Review In part one, I presented a very brief and… http://dlvr.it/FJYy6 [...]

  3. Aaron says:

    What about tables that are so large, the indexes won’t fix in memory? We used partitioning to help with performance. What would be the alternative?

  4. Martin Farach-Colton says:

    @Giuseppe,

    I agree with you, but the devil’s in the details. You say:

    “Indexes without RAM equates to horrible performance.” I’d make the clarification of “Indexes without RAM equates to horrible performance, if you are using InnoDB.” For TokuDB, secondary indexes that don’t fit in memory are not a problem. In fact, maintaining the indexes you really want on large tables is a great way to use TokuDB, and I’ve yet to find a case where partitioning helps performance for TokuDB, because we don’t experience the RAM cliff, and you can keep on indexing.

    So it’s not really indexing that requires RAM, it’s B-tree indexing.

    If you have a particular use case in mind, I’d be happy to try out TokuDB on it to see what the performance looks like with indexes turn on.

    @Aaron, my answer to you is pretty much the same. If your bottleneck is in the queries, either you are hitting the same part of the data repeatedly, in which case it gets swapped in, partitions or no partitions, or you are hitting lots of different parts of the data, in which case you spend your time swapping in different parts of the data, partitions or no partitions. Therefore, I assume your problem is with insertions into large tables. In that case, there’s an easy fix, which is to index using a storage engine that does well for disk-resident data, like TokuDB.

  5. [...] Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why [...]

  6. moses wejuli says:

    @Martin, great piece, thanks! personally i wud think you’d need partitions chiefly for circumventing OS file size limits… or does TokuDB handle this differently?
    Case in point: Some linux distros have a file size limit of about 2TB. Now, if i had a table so large that the compressed version was greater than 2TB, then i fear the only workaround wud be partioning, right?
    Am curious to know how TokuDB handles this case in point…

    Also,what is the theoretical/practical DB size limit TokuDB can handle before it starts to degrade in performance?

    1. leif says:

      Hi Moses, thanks for the comment. You’re right that ext2 and ext3 have a file size limit of 2 terabytes, but other filesystems have larger limits. In particular, xfs, which is what a majority of our customers use, handles files up to 8 exabytes.

      Fractal Trees don’t degrade as the database grows, they’re asymptotically faster than B-trees.

  7. moses wejuli says:

    @leif, many thanks. 8 exabytes will do me just fine — just need to swap filesystems.
    cheeers.

Leave a Reply

Your email address will not be published. Required fields are marked *