- Process over 10 million log entries per day without partitions or other workarounds.
MySQL + TokuDB Tames Huge Logfile Processing Workload
The Company: Founded in 2005, Jawa develops software and media solutions enabling people to stay connected to information, entertainment and communities while on the go. Jawa is a privately held company headquartered in Scottsdale, Arizona.
The Challenge: Jawa is experiencing tremendous growth as it provides critical billing infrastructure to mobile gaming and payment applications for third party solutions. Part of this requires the maintenance and analysis of hundreds of gigabytes of logfiles that need to hold data for anywhere from 14 to 120 days.
“We’re dealing with millions of log entries per day, and both the number of logs and the rate of growth are increasing,” according to Ernie Souhrada, Chief IT Architect at Jawa. Managing logfiles with timestamps is a common challenge in MySQL implementations. InnoDB has trouble with this since Jawa’s timestamp key is not unique. With InnoDB, the timestamp cannot be the primary key if it is not unique.
“Our in-house expertise is primarily with MySQL, so we were looking for a product which would allow us to leverage those skill sets as opposed to going in a completely different direction, such as Oracle, but not suffer from some of the inherent limitations present in InnoDB” stated Souhrada.
The Solution: Jawa uses TokuDB for both data warehousing and log analysis
“We found that TokuDB was able to maintain a more constant insertion rate over a large number of rows compared to InnoDB.” said Souhrada. For logfile analysis, Jawa took advantage of TokuDB’s unique Clustering Key feature. “TokuDB has allowed us to efficiently manage our logfiles by enabling us to cluster by timestamp, which, for our use case, is a highly effective alternative to partitioning.” By using a clustering key based on timestamp with TokuDB, Jawa was able to get around the InnoDB limitation of non-unique primary timestamp keys. This approach also avoided having to resort to partitioning, which simplified the implementation. With Clustering Key support on TokuDB, all it takes is adding a single clustering keyword to a table instead of managing a separate partition.
In addition, Jawa benefited from other TokuDB features such as high data compression that saved roughly 85% disk space, and from TokuDB’s zero index fragmentation that eliminated dump / reload (a.k.a. index rebuilding) downtime. Souhrada also noted the unique flexibility of the offering. “TokuDB is capable of doing a very good job with both OLAP and OLTP – it’s much more general purpose than other solutions in the market.”
Logfile performance: The only alternative to TokuDB’s clustering by time stamp is to either contend with fragmentation or partition. TokuDB’s Clustering Key Support ensures the logfiles can both be inserted at speed and easily dropped at the end of a full cycle.
Lack of Fragmentation: As with many companies focused on mobile applications, Jawa must keep system downtime as close to zero as possible. By avoiding having to perform offline operations (like dump / reload) to defragment the database, Jawa has been able to continuously service customers without interruptions caused by storage engine maintenance.
Ease of Implementation: Souhrada noted that “the migration path from InnoDB to TokuDB is probably easier than to any other third-party storage engine. Other solutions would have required a dump/reload to get up and running. In addition, alternatives would have invoked a query language different from standard MySQL, resulting in additional work at the application level. With TokuDB we were up and running in under a day instead of a week or more.”
Compression: In addition to fast insertion rates, TokuDB provides data compression levels that are much higher than InnoDB’s. TokuDB’s advanced compression technology reduced Jawa’s disk space requirements by 7x from roughly 2 TB down to about 300GB.
Pricing: “Tokutek’s pricing model also made TokuDB a substantially cheaper alternative to other database technologies that Jawa was also considering” claimed Souhrada. “Tokutek’s pricing is based on increments of 100GB, so the cost is easier to control than it is with some other products. For example, we looked at Infobright and their less granular (per-TB) pricing model would have made no sense for us, and their free/community edition didn’t have all the features we wanted.”
Moving ahead, Souhrada looks forward to the new features in TokuDB v5.0. “With the introduction of Hot Schema Changes in TokuDB v5.0, Tokutek makes deriving value out of a large MySQL database simple, giving us the option to much more easily analyze our data and generate value for our business.”