MySQL Partitioning: A Flow Chart

Posted On March 11, 2011 | By Martin Farach-Colton | 8 comments

In Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why people use partitioning. Here, I present a flow chart that summarizes what I’ve learned. In summary: with TokuDB in the picture there’s almost no reason to use partitioning. Or I should say, there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.

Here goes:

Partitioning Flowchart (Tokutek)

  1. Spindle contention? In other words, are you partitioning in order to spread your query work load across many disks? I’ve yet to see a compelling technical case that RAIDing your disks doesn’t do this as well, with much less setup and maintenance.
  2. Are deletions a problem? Are you partitioning because of issues arising from deletions (i.e., poor performance or index fragmentation)?
  3. Is fragmentation a problem? In InnoDB, deletions cause fragmentation, whereas dropping a partition doesn’t. However, TokuDB doesn’t fragment, and it’s a lot easier to switch the storage engine once than it is to maintain partitions forever.
  4. Do you really need instant block deletes? Dropping a partition is really fast. And InnoDB deletes can be really slow. But TokuDB deletions are very fast (not as fast as dropping a partition, but fast). Is the ability to drop a partition what matters, or is it good enough to do deletions super fast (though not instantaneously)? I’m pretty sure very few people rely on instant partition drop, but this type of use case (if it exists) would be a credible reason to use partitions.
  5. Large-table performance problems? When tables (and especially indexes) get larger than memory, InnoDB fails in a variety of ways (e.g., in the way its insertion rate drops precipitously). But that doesn’t mean all storage engines do. TokuDB, for example, is designed specifically to perform superbly with out-of-core databases.
  6. Slow queries? One failure mode for large tables is for queries to become slow. This can be addressed by adding covering indexes (or in the case of TokuDB either covering or clustering indexes).
  7. Slow indexing? Adding more indexes can cause indexing performance for InnoDB to deteriorate badly. But TokuDB is fast at indexing.
  8. You probably don’t need partitioning. I don’t know of any other use cases that argue for partitioning, but I’d love to hear about them if they exist.

8 thoughts

  1. YetAnotherBlogReader says:

    > Slow queries? – This can be addressed by adding covering indexes

    ^^ This one is failure. Partition pruning allows you to use index just for small part of table data (say, 1%). So no need to improve indexes if partition lets you significantly reduce amount of data to examine.
    If you don’t mention this, the whole article becomes too biased.

  2. Martin Farach-Colton says:

    @YetAnotherBlogReader,

    Thanks for the comment. This post was meant to summarize what I had learned from feedback on my previous two posts on partitioning. Please see this post for my point about partitioning and indexes. The upshot of that analysis is that partitions only improve non-covering indexes. A well-designed covering index will examine even less data than a table scan on a partition.

    In addition to what I wrote there, I’d like to make a further point here. Partitioning is an inflexible aid to indexing, in the following sense. Suppose, as an example, that you partition on DATE, and you define some indexes, on CUSTOMER_ID, on SKU, and on PRICE. This is, in a way, like defining indexes (DATE, CUSTOMER_ID), (DATE, SKU) and (DATE, PRICE).

    These indexes probably don’t cover your queries. If they did, you wouldn’t be partitioning. Your queries speeds will be ok, as long as your partition on DATE is fine enough that you only have to do potentially wasteful table scans on small partitions.

    Now suppose your business changes, and you need to start doing queries on SKU and PRICE, independent of DATE. You’d like to be able to define an index on (SKU, PRICE), but you’re stuck with a partition on DATE. These new queries will have to be performed on all partitions, and the more partitions there are, the worse your performance.

    So now you’re forced into a tradeoff between one class of queries — the ones that want a fine partition — and another class of a queries — the ones that want as little partitioning by DATE as possible. And repartitioning means tons of downtime. Partitioning reduces your future options. It is inflexible.

    The alternative is to define a good set of covering indexes from the get go. Then if a new query class comes up, you define a good index for it. There’s no tradeoff between the quality of this index and the quality of previous indexes… as long as you have a storage engine that can keep up with the increased indexing load.

    That’s where TokuDB comes in. It allows you to define the indexes you really need and to update those indexes under heavy insertion loads, even when you run out of RAM. By comparison, partitioning is cumbersome and restrictive.

  3. [...] at scale (tens of billions of rows), large deployments can easily be handled without resorting to partition based workarounds that are difficult to maintain. The technology insures queries takes full advantage of available [...]

  4. What about partitioning for the purpose of parallelization? As it stands, MySQL will not parallelize any queries — even ones that are obvious candidates, but you can do it yourself. Say you want to find all products where color=1. A hash function divides the data four ways.

    You could, in theory, pull in the data via FEDERATED pre-sorted from the data replicated and partitioned to a pool of N servers, then sort them together, which is linear complexity. You could also do the same thing locally across 4 threads. I think you’d have to use a TYPE=MEMORY table as it stands currently to do so. A native solution might be able to avoid that or use a cheaper data structure.

    Are you saying a covering index will suffice if you want to do this? I don’t think so. There are many environments where you could end up creating N! indices to cover queries for queries against lots of dimensions. In that case, you have to parallelize and/or pray that a bitmap-index-merge will allow you to use a few indices.

    Partitioning is more related to parallel computation than indexing to me.

    Am I wrong?

  5. [...] recently been blogging about how partitioning is a poor man’s answer to covering indexes. I got the following [...]

  6. Martin Farach-Colton says:

    @Jaime,

    There is a nomenclature issue going on here. The MySQL partition infrastructure is basically used for single-server breaking up of the table. There are performance reasons for doing this on InnoDB, but not on TokuDB.

    What you refer to — breaking up the table as part of scale-out — I usually call sharding, rather than partitioning, though of course there is no uniformity of the term’s usage. And I agree with you that for scale-out purposes, sharding can help in some cases.

    As for the N! comment, please see my post.

  7. [...] Partitions are a poor replacement for covering indexes. Define the indexes that actually solve your problem, and TokuDB will keep up with the insertion rate. [...]

  8. [...] MySQL Partitioning: A Flow Chart | TokutekMar 11, 2011 … In Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why people use partitioning. Here … Comments Off [...]

Leave a Reply

Your email address will not be published. Required fields are marked *