Ensuring sufficient disk I/O to catch copyright violations at network speed.
- Storage growth, including maxed-out disk I/O utilization
- Performance issues and business impact due to slow selects
- Inability to revise data schema on the fly
The Company: Evidenzia GmbH & Co. KG is one of the leading partners of the software, movie and music industry when it comes to tracing copyright infringements and illegal file sharing activities in peer-to-peer networks. Evidenzia helps copyright owners in protecting their intellectual property. Their powerful technologies enable copyright owners to trace and document illegal file sharing activities in P2P networks reliably. All data and documentation may then be used as evidence in court.
The Challenge: Evidenzia ingests a large amount of logging information each hour. The data not only needs to be processed in parallel for instant reporting, but also has to be stored in case it is ever needed as evidence in a legal case. To meet these needs, Evidenzia logs IP addresses while also performing a connect to each peer. In the process, the software fetches data to match it to the copyrighted material for proof of copyright violation.
“Prior to TokuDB, we were using InnoDB for storing all the data. We found that as the tables grew bigger, the selects were becoming slower, taking as much as an hour or more, and the disk I/O was growing higher” according to Director of Operations Bastian Axter.
To keep up with the workload, Evidenzia had considered several options, but they failed to meet program performance and price goals. These included:
Flash memory (SSD cache) – Storing all the data on SSD was much too expensive so Evidenzia considered using SSD cache inside the RAID controller. After testing this approach, Evidenzia discovered that it would not help because there was still too much data spread randomly to the disk, and the cache could not improve with random reads.
Partitioning - “Partitioning was one option that was reviewed to divide up the load,” Axter said. ” However, the management overhead that would have been required for all the tables and partitions was excessive. This approach would clearly have introduced more problems than it could have solved and would have resulted in additional management headache.”
The Solution: With Tokudb 5.2, Evidenzia can do all the inserts and selects in parallel and also delete deprecated data out of the same table, without the need to call an “optimize table” or slow down the other processes (insert/select). In addition, the compression of TokuDB tables proved invaluable in keeping the required disk space low.
“The fast indexes and the ability to delete without having to optimize the table, as well as the unique ability of Hot Index addition, really brought home how powerful TokuDB is” according to Axter. ”For these reasons, we were able to convert other tables to TokuDB as well.”
Below is a graph of the disk-usage (I/O max 100%) of the primary database, which shows the dramatic drop in disk I/O at week 46 when Evidenzia deployed TokuDB:
Disk utilization before and after TokuDB
“Most of the I/O came from the long running selects; they are gone since we introduced TokuDB into production,” according to Axter. “The overall impact on disk I/O was impressive, dropping from near 80-100% down to 5-10%.”
Cost Savings: With growth in InnoDB, as selects were slowing down, disk I/O was rising. Evidenzia would have had to buy additional drives just to keep up with the I/O. In addition, the compression on InnoDB wasn’t up to the task of being able to significantly shrink the tables on disk. “With TokuDB, we saved over 70% on storage,” according to Axter.
Performance: “There was an immediate impact with selects with TokuDB. These went from taking over an hour down to taking just minutes,” noted Axter. “Not only did TokuDB assist us with the select slowdown from large tables, but it also addressed our problems with deletes. Prior to TokuDB, deletes of already processed and archived data were far too slow because of the huge and slow fragmented indexes.”
Flexibility of Operations: With InnoDB, “optimize table” to rebuild the indexes was too disruptive to the business since it would block the whole logging process. With TokuDB, however, indexes don’t fragment and so they never require the database to be taken offline to rebuild them.