<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Improving TPC-H-like queries &#8211; Q17</title>
	<atom:link href="http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/</link>
	<description></description>
	<lastBuildDate>Thu, 02 Feb 2012 04:10:04 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Another look at improving TPC-H-like queries &#8211; Q17&#160;&#124;&#160;Tokutek</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-120</link>
		<dc:creator>Another look at improving TPC-H-like queries &#8211; Q17&#160;&#124;&#160;Tokutek</dc:creator>
		<pubDate>Thu, 19 Aug 2010 19:40:12 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-120</guid>
		<description>[...] An alternate approach, offered in response to our original post, provides excellent improvements for smaller databases, but clustered indexes offer better [...] </description>
		<content:encoded><![CDATA[<p>[...] An alternate approach, offered in response to our original post, provides excellent improvements for smaller databases, but clustered indexes offer better [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-114</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 26 Jun 2009 15:43:55 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-114</guid>
		<description>these settings may make a difference:
key_buffer_size=4G
read_buffer_size=16M
tmp_table_size=256M
max_heap_table_size=256M
sort_buffer_size=16M

This query plan doesn&#039;t use a filesort so it shouldn&#039;t generate a temporary table.  It should scale pretty well as long as key_buffer_size is &gt;= the size of the new composite index on the lineitem table.  You can check the size of that index with the information schema.

Setting PACK_KEYS=1 on lineitem will probably result in even improved results as you will get compression on repeating integer key values in the composite key.</description>
		<content:encoded><![CDATA[<p>these settings may make a difference:<br />
key_buffer_size=4G<br />
read_buffer_size=16M<br />
tmp_table_size=256M<br />
max_heap_table_size=256M<br />
sort_buffer_size=16M</p>
<p>This query plan doesn&#8217;t use a filesort so it shouldn&#8217;t generate a temporary table.  It should scale pretty well as long as key_buffer_size is >= the size of the new composite index on the lineitem table.  You can check the size of that index with the information schema.</p>
<p>Setting PACK_KEYS=1 on lineitem will probably result in even improved results as you will get compression on repeating integer key values in the composite key.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-113</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 26 Jun 2009 15:15:25 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-113</guid>
		<description>I added a key to:
lineitem(l_partkey, l_quantity)
and too
part(p_brand, p_container)

and then I analyzed both tables.  Without any hints MySQL comes up with this:

mysql&gt; explain
    -&gt; select sum(l_extendedprice) / 7.0 as avg_yearly
    -&gt; from lineitem  , part
    -&gt; where p_partkey = l_partkey
    -&gt;   and p_brand = &#039;Brand#33&#039;
    -&gt;   and (p_container = &#039;WRAP PACK&#039; )
    -&gt;   and l_quantity &lt; (
    -&gt;     select 0.2 * avg(l_quantity)
    -&gt;       from lineitem
    -&gt;      where l_partkey = p_partkey
    -&gt;    );
+----+--------------------+----------+------+---------------------------+--------------+---------+------------------------------+------+-------------+
&#124; id &#124; select_type        &#124; table    &#124; type &#124; possible_keys             &#124; key          &#124; key_len &#124; ref                          &#124; rows &#124; Extra       &#124;
+----+--------------------+----------+------+---------------------------+--------------+---------+------------------------------+------+-------------+
&#124;  1 &#124; PRIMARY            &#124; part     &#124; ref  &#124; PRIMARY,p_brand           &#124; p_brand      &#124; 20      &#124; const,const                  &#124;  157 &#124; Using where &#124;
&#124;  1 &#124; PRIMARY            &#124; lineitem &#124; ref  &#124; lineitem_fk3,part_qty_idx &#124; lineitem_fk3 &#124; 4       &#124; tpch1g_myisam.part.p_partkey &#124; 2376 &#124; Using where &#124;
&#124;  2 &#124; DEPENDENT SUBQUERY &#124; lineitem &#124; ref  &#124; lineitem_fk3,part_qty_idx &#124; lineitem_fk3 &#124; 4       &#124; tpch1g_myisam.part.p_partkey &#124; 2376 &#124;             &#124;
+----+--------------------+----------+------+---------------------------+--------------+---------+------------------------------+------+-------------+
3 rows in set (0.00 sec)

mysql&gt; select sum(l_extendedprice) / 7.0 as avg_yearly       from lineitem  , part     where p_partkey = l_partkey                 and p_brand = &#039;Brand#33&#039;                 and (p_container = &#039;WRAP PACK&#039; )                 and l_quantity &lt; (                           select 0.2 * avg(l_quantity)                             from lineitem                            where l_partkey = p_partkey    );
+---------------+
&#124; avg_yearly    &#124;
+---------------+
&#124; 312967.325714 &#124;
+---------------+
1 row in set (0.23 sec)

I don&#039;t have a SF10 myisam lying around, but you might want to give it a whirl if you do.</description>
		<content:encoded><![CDATA[<p>I added a key to:<br />
lineitem(l_partkey, l_quantity)<br />
and too<br />
part(p_brand, p_container)</p>
<p>and then I analyzed both tables.  Without any hints MySQL comes up with this:</p>
<p>mysql> explain<br />
    -> select sum(l_extendedprice) / 7.0 as avg_yearly<br />
    -> from lineitem  , part<br />
    -> where p_partkey = l_partkey<br />
    ->   and p_brand = &#8216;Brand#33&#8242;<br />
    ->   and (p_container = &#8216;WRAP PACK&#8217; )<br />
    ->   and l_quantity < (<br />
    ->     select 0.2 * avg(l_quantity)<br />
    ->       from lineitem<br />
    ->      where l_partkey = p_partkey<br />
    ->    );<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
| id | select_type        | table    | type | possible_keys             | key          | key_len | ref                          | rows | Extra       |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
|  1 | PRIMARY            | part     | ref  | PRIMARY,p_brand           | p_brand      | 20      | const,const                  |  157 | Using where |<br />
|  1 | PRIMARY            | lineitem | ref  | lineitem_fk3,part_qty_idx | lineitem_fk3 | 4       | tpch1g_myisam.part.p_partkey | 2376 | Using where |<br />
|  2 | DEPENDENT SUBQUERY | lineitem | ref  | lineitem_fk3,part_qty_idx | lineitem_fk3 | 4       | tpch1g_myisam.part.p_partkey | 2376 |             |<br />
+&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;-+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8211;+&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;+&#8212;&#8212;&#8212;&#8212;-+<br />
3 rows in set (0.00 sec)</p>
<p>mysql> select sum(l_extendedprice) / 7.0 as avg_yearly       from lineitem  , part     where p_partkey = l_partkey                 and p_brand = &#8216;Brand#33&#8242;                 and (p_container = &#8216;WRAP PACK&#8217; )                 and l_quantity < (                           select 0.2 * avg(l_quantity)                             from lineitem                            where l_partkey = p_partkey    );<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;+<br />
| avg_yearly    |<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;+<br />
| 312967.325714 |<br />
+&#8212;&#8212;&#8212;&#8212;&#8212;+<br />
1 row in set (0.23 sec)</p>
<p>I don&#8217;t have a SF10 myisam lying around, but you might want to give it a whirl if you do.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-112</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 26 Jun 2009 12:04:48 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-112</guid>
		<description>In the future call it DBT-3 query #17.  DBT-3 is the open source OSDB version of TPC-H(tm).  

I take it this was SF1?  Looks like you have about 6M rows in the fact table.</description>
		<content:encoded><![CDATA[<p>In the future call it DBT-3 query #17.  DBT-3 is the open source OSDB version of TPC-H(tm).  </p>
<p>I take it this was SF1?  Looks like you have about 6M rows in the fact table.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay Pipes</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-119</link>
		<dc:creator>Jay Pipes</dc:creator>
		<pubDate>Wed, 24 Jun 2009 03:43:52 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-119</guid>
		<description>Awesome, Dave!  Looking forward to your post.  And, yeah, as I understand it, TPC doesn&#039;t allow for a number of things.  This is one of the reasons I strongly support an open source, openly discussed, and vendor-neutral set of benchmarks :)

Cheers!

Jay</description>
		<content:encoded><![CDATA[<p>Awesome, Dave!  Looking forward to your post.  And, yeah, as I understand it, TPC doesn&#8217;t allow for a number of things.  This is one of the reasons I strongly support an open source, openly discussed, and vendor-neutral set of benchmarks <img src='http://www.tokutek.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Cheers!</p>
<p>Jay</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-118</link>
		<dc:creator>Dave</dc:creator>
		<pubDate>Wed, 24 Jun 2009 03:17:43 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-118</guid>
		<description>Jay,

I spent some time on your recommendations, and ran enough experiments that I thought it would be worth another post.  It should be up shortly.

Regarding the question of TPC-H rules for defining indexes - my point for the post was to show how clustering indexes can help, not how to get valid TPC-H performance measurement. &quot;TPC-H-like&quot; was used deliberately to avoid contending with the TPC rules.  All good ideas are welcome here.

Dave</description>
		<content:encoded><![CDATA[<p>Jay,</p>
<p>I spent some time on your recommendations, and ran enough experiments that I thought it would be worth another post.  It should be up shortly.</p>
<p>Regarding the question of TPC-H rules for defining indexes &#8211; my point for the post was to show how clustering indexes can help, not how to get valid TPC-H performance measurement. &#8220;TPC-H-like&#8221; was used deliberately to avoid contending with the TPC rules.  All good ideas are welcome here.</p>
<p>Dave</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay Pipes</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-117</link>
		<dc:creator>Jay Pipes</dc:creator>
		<pubDate>Tue, 23 Jun 2009 22:34:35 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-117</guid>
		<description>Hi!  Any feedback on the query above?

Cheers!

Jay</description>
		<content:encoded><![CDATA[<p>Hi!  Any feedback on the query above?</p>
<p>Cheers!</p>
<p>Jay</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-116</link>
		<dc:creator>Dave</dc:creator>
		<pubDate>Wed, 17 Jun 2009 19:57:10 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-116</guid>
		<description>Jay,

Thanks for the ideas.  Will try them in the next couple days and let you know how it goes.

Dave</description>
		<content:encoded><![CDATA[<p>Jay,</p>
<p>Thanks for the ideas.  Will try them in the next couple days and let you know how it goes.</p>
<p>Dave</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jay Pipes</title>
		<link>http://www.tokutek.com/2009/06/improving_tpc_h_like_queries_q17/#comment-115</link>
		<dc:creator>Jay Pipes</dc:creator>
		<pubDate>Tue, 16 Jun 2009 07:28:31 +0000</pubDate>
		<guid isPermaLink="false">http://improving_tpc_h_like_queries_q17#comment-115</guid>
		<description>You should get much better performance if you rewrote the query to the following:

select
  sum(li.l_extendedprice) / 7.0 as avg_yearly
from lineitem li
inner join part p
on li.l_partkey = p.p_partkey
inner join (
 select l_partkey, 0.2 * avg(l_quantity) as quantity
 from lineitem
 group by l_partkey
) as quantities
on li.l_partkey = quantities.l_partkey
and li.quantity &lt; quantities.quantity
where
p.p_brand = &#039;Brand#33&#039;
p.p_container = &#039;WRAP PACK&#039;;

and put an index on lineitem (l_partkey, l_quantity)

Not sure if TPC-H allows you to optimize the terrible (for MySQL) queries it produces, but the above would likely cut the performance by a factor of 100 or more..

I&#039;d be interested to see what EXPLAIN comes up with for the above rewritten query + additional index.

Cheers,

Jay Pipes

Cheers,

Jay</description>
		<content:encoded><![CDATA[<p>You should get much better performance if you rewrote the query to the following:</p>
<p>select<br />
  sum(li.l_extendedprice) / 7.0 as avg_yearly<br />
from lineitem li<br />
inner join part p<br />
on li.l_partkey = p.p_partkey<br />
inner join (<br />
 select l_partkey, 0.2 * avg(l_quantity) as quantity<br />
 from lineitem<br />
 group by l_partkey<br />
) as quantities<br />
on li.l_partkey = quantities.l_partkey<br />
and li.quantity < quantities.quantity<br />
where<br />
p.p_brand = &#8216;Brand#33&#8242;<br />
p.p_container = &#8216;WRAP PACK&#8217;;</p>
<p>and put an index on lineitem (l_partkey, l_quantity)</p>
<p>Not sure if TPC-H allows you to optimize the terrible (for MySQL) queries it produces, but the above would likely cut the performance by a factor of 100 or more..</p>
<p>I&#8217;d be interested to see what EXPLAIN comes up with for the above rewritten query + additional index.</p>
<p>Cheers,</p>
<p>Jay Pipes</p>
<p>Cheers,</p>
<p>Jay</p>
]]></content:encoded>
	</item>
</channel>
</rss>

