Managing metadata at exabyte scale
Delivering Agile Storage in the Cloud with Billions of Assets
Founded in 2001, Limelight Networks, Inc (NASDAQ: LLNW) is an Internet platform and services company that integrates the most business-critical parts of the online content value chain. Limelight’s cloud-based services enable customers to profit from the shift of content and advertising to the online world, from the explosive growth of mobile and connected devices, and from the migration of IT applications and services to the cloud. More than 1,800 customers worldwide use Limelight’s massively scalable services to better engage audiences, optimize advertising, manage and monetize digital assets and build stronger customer relationships.
Limelight designed a unique high-availability Agile Storage cloud service, which gives users control over how and where their content is stored by offering massive storage capacity, extreme flexibility for setting business rules and replication policies, with localized ingest and content access around the globe. The service provides vast storage volumes for large libraries of any type of digital asset.
The system was designed for a total capacity on the order of exabytes worldwide and is presently capable of supporting over 100 billion assets. To succeed with the platform, Limelight needed a storage engine that could handle insertion and query performance on large tables and scale as the database grew, and it needed to accomplish this in a cost effective manner. “This vast amount of information brings with it a rich and large amount of metadata around policies, file names, storage pointers, asset registries, users, and groups” according to Wylie Swanson, VP Technology, Cloud Services at Limelight. “Ensuring the metadata could be managed in an efficient and flexible way was critical to the design of the offering.”
A number of options Limelight had considered were insufficient. These included:
InnoDB – Despite familiarity with InnoDB, Limelight found that it didn’t meet the project’s requirements. According to Swanson “the minute you run out of RAM for indexing, InnoDB performance starts to fall apart. We were seeing this occur at 50M – 100M rows. You can shard content, of course, but that feeds back into application and management complexity. Moreover, not all of our database schema is amenable to simple sharding methods.”
RAM Expansion – “While high powered servers and more RAM can somewhat extend the size of a database that InnoDB can handle, doing so is ultimately cost prohibitive” according to Swanson. “To support our system using more traditional database technology, we would have had to purchase terabytes of RAM for our servers.”
Schooner – “Schooner offered performance improvements, but was too expensive. In addition, it didn’t look like it could achieve the performance levels of our commodity servers using TokuDB in our application” according to Swanson.
Limelight Agile Storage uses TokuDB for metadata management
Limelight needed a system that could access the database remotely with high availability, flexibility, performance and capacity. Limelight chose MariaDB for components of the platform. To satisfy the need for high availability, the Agile Storage Service uses a high availability Linux cluster to manage the metadata.
For the requirements of flexibility, performance and capacity, TokuDB was an unparalleled choice. “TokuDB provides incredible scaling, keeping a high insert rate throughout as the metadata repository continues to grow” noted Swanson. “This is crucial to keeping up with high-ingest points that are spread all around the world. TokuDB also provides the underpinning for a system that supports arbitrary queries – for example which policies are expired on which assets.”
In addition, Limelight benefited from other TokuDB features such as high data compression yielding a savings of 65% of disk capacity for the meta-directory components.
Scale: The Agile Storage platform was designed to scale to exabytes of data. Cost effectively scaling compute power, storage, and software was critical to the design. “We don’t know how we could have gotten to our required scale and price points for our meta-directory components without TokuDB” according to Swanson.
Ease of Implementation: Swanson noted that “TokuDB worked seamlessly from the start with MariaDB. Installing it was quick and simple, and we were up and running in a few hours and it worked out-of-the-box with default settings, so that we could focus on maximizing the performance of our platform, not our databases.”
Compression: In addition to fast insertion rates, TokuDB provides data compression levels that are much higher than InnoDB’s. TokuDB’s advanced compression technology reduced Limelight’s disk space requirements by roughly 3x, from over 1 TB down to about 350 GB.