What’s the Big Idea?
Big Data + Big Ideas = Big Advantage
“Big Data” is the new buzzword in IT. Doing a Google search will yield over 1M hits. In Boston, there were two back-to-back Big Data events last month. On Wednesday in New York, there was another Big Data event.
Of course, everyone has an opinion on Big Data. And while size of data is always top of mind, there were some interesting discussions around the time component of Big Data – specifically, how fast data comes in and how fast one can respond to it.
Here’s a sampling of some of the interesting discussions (which was in our Twitter stream during the show):
The Need for Speed: Sample of Real World Data Rates
- Communication – according to Braxton Woodham at Tap11 the Twitter Firehose has grown from 90 million tweets per day to 160 million per day in one year.
- Location – Jeff Jonas of IBM noted there are about 600B geo-locational transactions per day in the US alone.
- Consumption –Marc Parrish of Barnes and Noble pointed out that the typical consumer data diet is 34 GB a day, which is increasing at a rate of 6% per year.
The Need for Business Agility: Acting in Near Real Time
o Erick Swan of Splunk notes that customers expect an answer in hours or shorter.
o Hilary Mason of bit.ly had a similar view, commenting that it was important to do Big Data analysis in “relatively recent time.”
- Data is no longer orderly
o Jeff Jonas of IBM made the point that data used to be much more orderly – ie a payroll date database of the past. Now, the data in question is much more varied, such as all the searches that are done today.
Given this, how can businesses, especially those running MySQL respond?
According to what Mike Vizard wrote yesterday at CTO Edge, “A lot of IT organizations that adopted MySQL as a way to save money are now running up against performance issues. The dilemma they now face is coping with the costs associated with porting those applications under the database environment, which is a scenario favored by Oracle, or employing a variety of unnatural database management tasks known as ‘sharding’ to make the MySQL environment scale.” He also noted “some IT organizations may try to rely on dedicated appliances to improve the performance of MySQL. But a lot of organizations find that approach to solving the MySQL performance challenge to be cost prohibitive in terms of both acquiring those appliances and then managing them on an ongoing basis.”
But, that doesn’t fix a fundamental problem. According to Lee Edlefsen of Revolution Analytics who was at the Big Data Conference, most algorithms for data analysis are on the order of 30-50 years old Indeed this is true. One way to address the common but outdated B-tree is to try something else out such as Fractal Tree indexes. (That, of course, is what Tokutek does).
How does this help with the issues around speed and agility? For speed, it helps with insertion rates – at which TokuDB has a performance advantage of 20x or more vs. InnoDB. Seems pretty incredible (in fact one news agency assumed the 20x improvement must have been 20% – see correction note and updated article). But, it’s true – in fact, we have a customer seeing an 80x speedup.
Second, Agility is key. TokuDB v5.0 gives more agile control over your data model by adding features such as hot column addition and hot indexing. According to Ernie Souhrada, Chief IT Architect at Jawa “With the introduction of Hot Schema Changes in TokuDB v5.0, Tokutek makes deriving value out of a large MySQL database simple, giving us the option to much more easily analyze our data and generate value for our business.”
Stuart Miniman of Wikibon summarized the challenge and the opportunity well. “The growth of data, especially in real-time Web 2.0 environments, can be a burden or an opportunity for new products and new revenue. MySQL users should consider TokuDB to increase agility, speed and scalability.”