From Under the Desk to the Cloud

Posted On September 26, 2011 | By Tokutek | 1 comments

 

Review of the O’Reilly Strata Making Data Work Conference

(reprinted from my guest blog for the Cloud Council of 7)

Monica Rogati of LinkedIn told a story of the early days at the firm, when the reporting system consisted of a single server under someone’s desk. One day, someone needed an Ethernet cable and unplugged the machine from the data outlet in the wall. LinkedIn’s data reporting, its life blood, instantly came to a screeching halt.

The Push to the Cloud

LinkedIn, like many other social network sites, eventually would face enormous growth and have to develop new processes and procedures that would allow them to be an effective cloud repository for people’s work contacts and resumes. The quantity of data that social sites have to contend with is staggering. Monica summed it up well in the title of her talk: “1M. 10M. 100M. Data!” And LinkedIn is far from alone – others spoke of other similar increases. Peter Sirota from Amazon Web Services in his talk noted how Yelp generates close to 400 GB of compressed logs per day and that Foursquare has to track over 1M members and 15M venues.

Big Data, Even Bigger Hair

So why is big data becoming so big as of late (and spawning so many conferences?) Richard McDougall of VMware summed up some of the driving forces:

  • The enterprise is experiencing data growth rates of 60% year over year
  • People are starting to see real value out of the data given its breadth and new supporting tools
  • The value from data exceeds hardware cost

Richard went on to state why the cloud is performing so well here. The cloud

  • Reduces the complexity
  • Dramatically lower costs
  • Enables flexible, agile IT service delivery

Of course, big name vendors are rushing across the stack to fill in offerings. Peter claimed that Amazon’s EC2 lowers the cost of operating a distributed system for data processing. Chris Schalk of Google noted that customers should “focus on building your apps and let us wear the pagers”, given the release of their Google Apps toolset.

Implications and Benefits

So when companies get it right, what are the implications and benefits of big data in the cloud? The success in the cloud, according to Peter, is leading to better analysis and recommendations just to name a few key areas. And it’s not just the commercial space benefiting. The conference was also great at showcasing how big data availability was shaping areas outside of traditional consumer tech. NYC is making its data publicly available for people to explore and work-on. Nonprofits are also following suit. Data without Borders spoke of an upcoming Datadive weekend for nonprofits who can’t afford data scientists. At the event volunteer data scientists and enthusiasts will be given access to the data for a crowd-sourced approach to finding new insights. Even the biggest names in foundations are seeing the value in big data. Alastair Dant of The Guardian newspaper noted how the Bill and Melinda Gates Foundation are teaming up with The Guardian to make a public data store of information available on world development statistics.

Don’t Let Your Kitten Crash

So, how well is your business prepared for growth? Hilary Mason of bit.ly noted that much of big data either comes from “secret US government scale” or “kittens on the internet scale.” With the former, there is often much advance planning. With the latter, just like the surprise someone gets when Fluffy goes viral, people are often caught off guard when their business volume grows dramatically. That means plan early. Design ahead. Make sure that your infrastructure can take you to the next level of growth. Importantly, consider whether the agility, enabled by the cloud, makes the most sense for you and make sure you are monitoring the right growth parameters in your business. In other words, don’t let your kitten crash.

One Comment

  1. Leo Leung says:

    Great summary. I attended the first Strata in Santa Clara and was similarly impressed with the growth and value of data, particularly data to profile user behavior (pretty creepy actually). It’s a great time to be a data scientist.

Leave a Reply

Your email address will not be published. Required fields are marked *