- Performance issues and business impact due to slow insertion rates.
- System crashes and downtime significantly disrupting business service levels.
- Need to contain expansion of storage requirements.
Radically accelerated web crawler performance improved accuracy and timeliness of “Advanced Search” application for Facebook users.
The Company: Profile Technology Ltd. is a UK-based provider of social media applications. Founded in 2007, Profile develops applications for Facebook, Myspace, Bebo, Orkut Hi5 and Friendster, and provides Social Network application consultancy services. Profile offers the only independent advanced search engine for Facebook.
The Challenge: Profile needed to overcome performance issues and scalability limits to expand its capabilities and to ensure effective delivery of its Facebook services.
“We offer Facebook users advanced search capabilities and a variety of other applications,” explains Chris Claydon, Managing Director at Profile Technology Ltd. “To keep records up to date and expand our coverage our web crawler needs to regularly update multi-table records for more than 300 million people. We were hitting the wall with MyISAM and InnoDB.”
The advanced search engine stores public information available from Facebook on a RAID10 drive powered by an 8-core database server. The database was 500GB in total, with the largest table measuring in at more than 2 billion rows. The goal to ramp up the data collection and expand this table to 8 billion rows was impeded by the slow InnoDB insertion speeds.
“Our crawlers were crippled because we couldn’t get new data into the database fast enough. I estimated it would take 3 to 6 months to do just one crawl of the public Facebook pages with InnoDB, which is limited by the number of random inserts it can do.”
The slow pace of updating the database was restricting the expansion of services Profile was looking to develop and support, holding back the growth of their business.
Further, “a series of slow InnoDB recoveries after server crashes caused by heavy load took our service offline for up to 30 hours each. Our reliability as a service provider was at risk – something we are simply not going to accept.”
The Solution: Profile installed TokuDB v3.0 in about an hour. Over the next few days they converted their largest tables to TokuDB one at a time. Crawler performance improved immediately.
Crawler performance as measured by bandwidth consumption
“The overall performance improvement of the crawler was about 20x. The bottleneck moved from the database to the crawl script. In addition, we could now increase speed even further just by running more crawler threads in parallel. Even with extra crawlers running, the database is not heavily loaded.”
The Benefits: Profile realized rapid technology benefits that in turn delivered business advantage to the firm.
“I measured an 80x improvement on our crawler’s insertion rate, boosting our overall performance by 20x, and we expect our 2 billion row database will now quickly grow to 8 billion or more. Insertion speed is no longer the bottleneck and we can now complete a full crawl in just 1-2 weeks.”
“There are other advantages too. For one thing, TokuDB reduced our storage requirements significantly, saving the cost of upgrading to larger RAID storage. For another, recovery is much faster – minutes instead of more than a day – and that keeps our services online and providing value to our customers.”
“A speed improvement of this size is a big deal for our business – it lets us offer new services and capabilities that were previously infeasible.”