New England’s Victory (for Big Data)

Posted On February 6, 2012 | By Tokutek | 0 comments

While it might not have been New England’s weekend on the Big Gridiron, it was certainly New England’s day for Big Data at the New England Database Summit on Friday at MIT.

The summit was well attended, with 350 registrants and keynotes from prominent MySQL users such as Mark Callaghan. The coverage was quite broad, with presentations running the gamut from grad students (complete with bodyguards and intimidating academic advisors) to established companies such as StreamBase. The sponsor list was an A-list this year as well, with EMC and Microsoft being the two biggest backers.

There were far too many and diverse topics to write about all of them. That said, here were a few of the notable ones.

Keynote #1: Johannes Gehrke (Cornell): Declarative Data-Driven Coordination

Johannes Gehrke of Cornell kicked off with the first keynote on Declarative-Driven Coordination. His methodology shed light on an alternative to out-of-band communication. The presentation focused on how to successfully handle entangled queries.

More Sleep for Tom and Meg if They Can Just Coordinate

In brief, what he showed is a way for someone to see if their friend is on a flight and have the database go about satisfying mutual constraints. With a proof that is outlined in his Sigmod paper, his main theorem is that any schedule that is entangled-isolated is also oracle-serializable. It’s a clever approach, as long as one’s set of friends being entangled remains small.

Keynote #2: Mark Callaghan (Facebook): Performance is Overrated

The room got a little quiet when Mark took the stage. Some people were expecting a possible rehash of this summer’s brouhaha between Mike Stonebraker and Facebook on the fate of MySQL. But, instead Mark jumped into some very practical discussions about managing MySQL at scale.

First, he noted that manageability needs more attention since…

    • The cost of extra hardware can be predicted
    • The cost of downtime cannot
    • Downtime comes in many forms (server down and server too busy)

For Mark, manageability has a number of meanings. This includes the rate of interrupts/server for the operations team. Mark finds that while the server count grows quickly, his operations team grows slowly. Hence, it is imperative that the quality-of-service improve over time (i.e., Does work get done? Does work get done on time?).

Mark and his team use MySQL for a number of reasons. First, it was there when Mark arrived. Second, Mark and his team made it scale 10x. Finally, Mark likes MySQL for OLTP.

As Facebook has grown though, so have the number of servers. This is due to “Big Data” x high QPS. Hence, they have had to add servers to add IOPs. To address this, Mark noted that flash memory (SSD) is very interesting as are (we blush) write-optimized databases.

The last part of his presentation focused on advice for scaling: More Data, More QPS. His tips were quite straightforward:

    • Fix stalls to make use of capacity
      • Don’t make MySQL faster, make it less slow
    • Improve efficiency to use less
    • Repeat

 Additional details can be found in Sheeri’s excellent live blog of the presentation.

New Tools and Systems Session: Willis Lang (University of Wisconsin): Energy-Conscious Data Management Systems

Just as Mark stressed that performance isn’t everything when he spoke about management, Willis Lang pointed out another key concern.  His slides noted that “three decades of database research has optimized for the highest possible performance possible regardless of energy consumption.” (We agree and have written about this topic as well).

Willis and his team have been looking at various techniques for addressing this such as using variable speed disks. He has been systematically studying the power/performance trade-offs of hardware components. The preliminary memory-based results showed that interesting trade-off opportunities exist if one rethinks database design principles. His presentation focused on the improvements that can be seen with memory parking. Additional details on his research can be found here.

As mentioned previously, there were many good talks — much more could be written about the event. Other interesting speakers included David Karger who introduced Dido, which seeks to make database manipulation as easy as document editing, and Alvin Cheung whose Pyxis project eases application development with automatic code partitioning based on application and server characteristics.

Kudos to Samuel Madden (MIT) and Ugur Cetintemel (Brown University) for organizing the event. Additional details can also be found via the Twitter hashtag #nedb12 and the event homepage.


Leave a Reply

Your email address will not be published. Required fields are marked *