<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tokutek &#187; dave</title>
	<atom:link href="http://www.tokutek.com/author/dwells/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tokutek.com</link>
	<description></description>
	<lastBuildDate>Thu, 02 Feb 2012 15:28:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Loading Tables with TokuDB 4.0</title>
		<link>http://www.tokutek.com/2010/09/loading-tables-with-tokudb-4-0/</link>
		<comments>http://www.tokutek.com/2010/09/loading-tables-with-tokudb-4-0/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 17:47:27 +0000</pubDate>
		<dc:creator>dave</dc:creator>
				<category><![CDATA[TokuView]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[TokuDB]]></category>
		<category><![CDATA[TokuDB loader]]></category>

		<guid isPermaLink="false">http://tokutek.com/?p=1692</guid>
		<description><![CDATA[Often, the first step in evaluating and deploying a database is to load an existing dataset into the database.  In the latest version, TokuDB makes use of multi-core parallelism to speed up loading (and new index creation).  Using the loader,&#8230;]]></description>
			<content:encoded><![CDATA[<p>Often, the first step in evaluating and deploying a database is to load an existing dataset into the database.  In the latest version, TokuDB makes use of multi-core parallelism to speed up loading (and new index creation).  Using the loader, MySQL tables using TokuDB load 5x-8x faster than with previous versions of TokuDB.</p>
<h2>Measuring Load Performance</h2>
<p>We generated several different datasets to measure the performance of TokuDB when doing a LOAD DATA INFILE &#8230; command.  To characterize performance, we vary</p>
<ul type="disc">
<li>rows to load</li>
<li>keys per row</li>
<li>row length (including keys)</li>
</ul>
<p>All generated keys, including the primary, are random, 8-byte values.   The remaining data, needed to pad out the row length to specified length, is text.</p>
<p>Two files files are produced as part of data generation.</p>
<ol>
<li>data file, containing &#8216;|&#8217; separated fields</li>
<li>sql file,  containing the CREATE TABLE command corresponding to the generated data</li>
</ol>
<p>For instance, if the number of keys is 3 and the row length is 256 bytes, the following SQL statement is produced:</p>
<pre>     CREATE TABLE load_table (
         val0 BIGINT UNSIGNED NOT NULL,
         val1 BIGINT UNSIGNED NOT NULL,
         val2 BIGINT UNSIGNED NOT NULL,
         pad VARCHAR(232) NOT NULL,
         PRIMARY KEY (val0),
         KEY valkey1 (val1),
         KEY valkey2 (val2)
         ) ENGINE=tokudb</pre>
<p>We can make the data generation program available if anyone is interested.</p>
<h3>Load Test</h3>
<p>A simple shell script</p>
<ul>
<li>creates the test table</li>
<li>performs a LOAD DATA INFILE &lt;datafile&gt; INTO TABLE load_table FIELDS TERMINATED BY &#8216;|&#8217;</li>
<li>returns execution time</li>
</ul>
<p>For the experiments to be meaningful, we created datasets that do not fit in memory.</p>
<h2>Results</h2>
<p>We ran our benchmark on an Amazon Web Services c1.large node with 8 cores and 7 GB of memory. The test loads 100M rows (NOT pre-sorted).  The data file was on a 2 disk RAID-0, the MySQL DB files on a different 2 disk RAID-0.</p>
<h4>TokuDB Version 3 (~single-threaded) v. TokuDB Version 4 (multi-threaded)</h4>
<table border="1">
<tbody>
<tr>
<td width="50" align="center">Keys</td>
<td width="75" align="center">Row Len</td>
<td width="110" align="center">v3 rows/s</td>
<td width="110" align="center">v4 rows/s</td>
<td width="75" align="center">Speedup</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">64</td>
<td align="center">27K</td>
<td align="center">142K</td>
<td align="center">5.1</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">64</td>
<td align="center">13K</td>
<td align="center">82K</td>
<td align="center">6.2</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">256</td>
<td align="center">7K</td>
<td align="center">54K</td>
<td align="center">7.2</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">256</td>
<td align="center">5K</td>
<td align="center">43K</td>
<td align="center">8.2</td>
</tr>
</tbody>
</table>
<h4>Other metrics</h4>
<p>Several metrics can be used to measure performance:</p>
<ul>
<li>rows per second : data insert rate</li>
<li>key-value pairs per second : indicates how fast the primary and secondary indexes are being created</li>
<li>MB/s : how much raw data is being added to the database</li>
</ul>
<p>Metrics for TokuDB v4:</p>
<table border="1">
<tbody>
<tr>
<td width="50" align="center">Keys</td>
<td width="75" align="center">Row Len</td>
<td width="75" align="center">Rows/sec</td>
<td width="100" align="center">KV-pairs/sec</td>
<td width="50" align="center">MB/sec</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">64</td>
<td align="center">142K</td>
<td align="center">142K</td>
<td align="center">9.1</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">64</td>
<td align="center">82K</td>
<td align="center">330K</td>
<td align="center">5.3</td>
</tr>
<tr>
<td align="center">1</td>
<td align="center">256</td>
<td align="center">54K</td>
<td align="center">54K</td>
<td align="center">13.9</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">256</td>
<td align="center">43K</td>
<td align="center">173K</td>
<td align="center">11.1</td>
</tr>
</tbody>
</table>
<p>These results show</p>
<ol>
<li>significant parallelization (we believe larger CPU core count machines will see even larger benefits)</li>
<li>a significant jump in absolute load performance</li>
<li>speed-ups are not limited to tables with many keys &#8211; even the 1 key tables are 5-7x faster</li>
</ol>
<p>We will report further results, especially speedups on larger CPU count machines, as they become available.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tokutek.com/2010/09/loading-tables-with-tokudb-4-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recovery Times &#8211; Part Deux</title>
		<link>http://www.tokutek.com/2010/01/recovery-times-part-deux/</link>
		<comments>http://www.tokutek.com/2010/01/recovery-times-part-deux/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 20:13:15 +0000</pubDate>
		<dc:creator>dave</dc:creator>
				<category><![CDATA[TokuView]]></category>

		<guid isPermaLink="false">http://tokutek.com/?p=986</guid>
		<description><![CDATA[In a follow-up experiment to an <a href="http://tokutek.com/2009/12/recovery-time-for-tokudb/">earlier post</a> on <a href="http://tokutek.com/products/tokudb-for-mysql-v3-0-beta/">TokuDB</a> recovery times, I tried to create a better apples-to-apples comparison to InnoDB recovery time.  If I measure recovery times when both DBs are doing the same amount&#8230;]]></description>
			<content:encoded><![CDATA[<p>
In a follow-up experiment to an <a href="http://tokutek.com/2009/12/recovery-time-for-tokudb/">earlier post</a> on <a href="http://tokutek.com/products/tokudb-for-mysql-v3-0-beta/">TokuDB</a> recovery times, I tried to create a better apples-to-apples comparison to InnoDB recovery time.  If I measure recovery times when both DBs are doing the same amount of work, TokuDB requires only 2s to recover from a crash, compared to 1020s for InnoDB.
</p>
<h3>
Background<br />
</h3>
<p>
In the <a href="http://tokutek.com/2009/12/recovery-time-for-tokudb/">first experiment</a>, I compared recovery times when both storage engines (TokuDB and InnoDB) were inserting at maximum rates.  In that experiment, following a power cord pull and server restart, TokuDB recovered in 501s, InnoDB in 18505s.  <a href="http://tokutek.com/2009/12/recovery-time-for-tokudb/comment-page-1/#comment-249">In response</a>, it was suggested that I run a <a href="http://noc.wikimedia.org/~midom/mysql.tar">patched version</a> of InnoDB that vastly improves InnoDB recovery times.  The patched version recovered in 1020s.  (Quite an improvement!)
</p>
<p>
The first experiment ignored the fact that TokuDB was inserting &gt;16,000 rows/s into the table, while InnoDB was inserting less than 200 rows/s.  Given that TokuDB was doing so much more work, you would expect a longer recovery time.  This post discusses an experiment where recovery times were measured when both DBs were doing the same amount of work.</p>
<h3>
The experiment<br />
</h3>
<p>
As before, I ran on the following server:
</p>
<pre>
Sunfire x4150
Dual socket quad core Xeon 3.1GHz
16GiB memory
2 SAS 146GB
6 SAS 146GB hardware RAID 0 256KiB stripe
CentOS 5.1
Ext3 file system
</pre>
<p>
The disks and RAID controller were configured with the write-caches OFF.
</p>
<p>
I modified  <a href="https://code.launchpad.net/~wells-tokutek/mysql-patch/mytools">iiBench</a> to generate rows at a rate of ~200 rows per second for insertion into a MySQL database so that both storage engines would perform insertions at the same rate.  After inserting 250M rows, I pulled the power cords, then measured the time to restart mysqld after the machine was rebooted.  I repeated this experiment inserting ~2000 rows per second.  The results:
</p>
<table>
<tr>
<td>Storage Engine</td>
<td>Insert Rate</td>
<td>Recovery Time</td>
</tr>
<tr>
<td>TokuDB</td>
<td>16,000 rows/s</td>
<td>501s</td>
</tr>
<tr>
<td>TokuDB</td>
<td>2,000 rows/s</td>
<td>13s</td>
</tr>
<tr>
<td>TokuDB</td>
<td>200 rows/s</td>
<td>2s</td>
</tr>
<tr>
<td>Stock InnoDB</td>
<td>200 rows/s</td>
<td>18505s</td>
</tr>
<tr>
<td>Patched InnoDB</td>
<td>200 rows/s</td>
<td>1020s</td>
</tr>
</table>
<p>
The reduced workload meant there was far less log data to scan and recover.  These results also match our findings when we recover from power plug pulls on typical applications, which don&#8217;t necessarily use all of TokuDB&#8217;s insertion bandwidth at all times.
</p>
<p>
A final note: it takes mysqld 1s to start when no recovery is required.
</p>
<p>
<a href="http://www.imdb.com/title/tt0107144/quotes">Looks like the upper hand, is on the other foot!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.tokutek.com/2010/01/recovery-times-part-deux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recovery Time for TokuDB</title>
		<link>http://www.tokutek.com/2009/12/recovery-time-for-tokudb/</link>
		<comments>http://www.tokutek.com/2009/12/recovery-time-for-tokudb/#comments</comments>
		<pubDate>Wed, 16 Dec 2009 16:56:53 +0000</pubDate>
		<dc:creator>dave</dc:creator>
				<category><![CDATA[TokuView]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[TokuDB]]></category>

		<guid isPermaLink="false">http://tokutek.com/?p=943</guid>
		<description><![CDATA[Last week Tokutek released version 3.0.0 of <a href="http://tokutek.com/products/tokudb-for-mysql-v3-0-beta/">TokuDB</a>, adding ACID transactions to its list of features.  This post discusses an experiment we ran to measure recovery time following a system crash.
In summary, while actively inserting records into a&#8230;]]></description>
			<content:encoded><![CDATA[<p>Last week Tokutek released version 3.0.0 of <a href="http://tokutek.com/products/tokudb-for-mysql-v3-0-beta/">TokuDB</a>, adding ACID transactions to its list of features.  This post discusses an experiment we ran to measure recovery time following a system crash.</p>
<p>In summary, while actively inserting records into a MySQL database using <a href="https://code.launchpad.net/~wells-tokutek/mysql-patch/mytools">iiBench</a>, we compared the time to recover from a power-cord pull for both InnoDB and TokuDB.</p>
<table>
<tr>
<td>
   Storage Engine</td>
<td>       Recovery Time</td>
</tr>
<tr>
<td>      TokuDB</td>
<td>               501s (8 min, 21 sec)</td>
</tr>
<tr>
<td>      InnoDB</td>
<td>             18505s (5 hours, 8 min, 25 sec)</td>
</tr>
</table>
<p>This is by no means an exhaustive look at recovery performance, but does illustrate the benefits of Tokutek&#8217;s approach.</p>
<h3>The experiment</h3>
<p>We ran iiBench on the following server:</p>
<pre>
Sunfire x4150
Dual socket quad core Xeon 3.1GHz
16GiB memory
2 SAS 146GB
6 SAS 146GB hardware RAID 0 256KiB stripe
CentOS 5.1
Ext3 file system
</pre>
<p>The disks and RAID controller were configured with the write-caches OFF using <a href="http://downloadcenter.intel.com/SearchResult.aspx?lang=eng&#038;ProductFamily=Server+Products&#038;ProductLine=RAID+Controllers+for+OEMs&#038;ProductProduct=Sun+StorageTek*+SAS+RAID+HBA,+Internal">arcconf</a>. (Note: an experiment with the RAID controller write-cache ON resulted in a non-recoverable database, as you should expect)</p>
<p>We ran an insert-only iiBench test.  After inserting about 250M rows and while actively inserting more, we manually pulled the plugs (2 on a Sunfire x4150).  We then powered up the server and timed how long it took to start mysqld.</p>
<p>The results for InnoDB are not surprising &#8211; they are in line with results reported elsewhere.  During recovery, InnoDB reported 1 transaction to be rolled back or cleaned up, and 68 row operations to undo.</p>
<p>The results for TokuDB are encouraging, in that they indicate recovery can be performed in minutes instead of hours.  In this experiment, rebooting the server almost took as long as recovering the TokuDB database.</p>
<p>The primary reason for TokuDB&#8217;s quick recovery time is the use of regular, automatic checkpoints.  This feature (introduced in version 2.0.0) ensures that work is completed and written to disk (including the necessary fsync) on a regular basis.  Recovery is limited to processing uncommitted transactions and work since the last checkpoint.   When not pushing performance to the absolute max we find recovery times typically take less than a minute.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tokutek.com/2009/12/recovery-time-for-tokudb/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

