Rockset's RocksDB-Cloud Library - Enabling the Next Generation of Cloud Native Databases
November 7, 2018
Rockset and I began collaborating in 2016 due to my interest in their RocksDB-Cloud open-source key-value store. This post is primarily about the RocksDB-Cloud software, which Rockset open-sourced in 2016, rather than Rockset's newly launched cloud service. In it, I will explore how RocksDB-Cloud can be used to build an open-source cloud-friendly storage system.
Rockset's emergence from stealth mode deserves some reflection on a key observation underlying their platform: there are a core set of services common to offerings across the largest public Cloud Service Providers (CSPs). Two in particular, REST-based Object Storage (e.g. Amazon S3) and Event Streams (e.g. Amazon Kinesis), are used to compose other services, serving as a shared storage service for these caching systems. Rockset's open-source RocksDB-Cloud library provides an interesting illustration on how existing caching systems can be adapted for the cloud.
What is meant by a "caching system?" This is a system that manages its state across main memory and primary storage. RocksDB employs an implementation of the Log Structured Merge Tree (LSM-Tree) to achieve this goal. Underlying the LSM-Tree is a rule of thumb that has held for over 30 years. The "Five-Minute Rule" captures succinctly the inherent economic trade-off between memory and storage in data store design (Appuswamy/ADMS@VLDB 2017). What's unique about RocksDB-Cloud's vision is how this trade-off is adapted to the cloud.
A Cloud Native Database (CNDB) is a local database system built specifically for the cloud era. Such a database is designed to take full advantage of the abundance of networking, processing, and storage resources available in the cloud. To this end, a CNDB maintains a consistent image of the database--data, indexes, and transaction log--across cloud storage volumes to meet user objectives, and harnesses remote CPU workers to perform critical background work such as compaction and migration.
How does a database system designed to operate on a local host get refactored so that a consistent image of its state resides on Cloud Storage? The answer is twofold. First, the database must be a caching system as opposed to a memory system. Caching systems maintain the full image of the database on local persistent storage while only the active state is in memory. Once identified, the transformation of such a caching system to a CNDB requires that this persistent state be mapped on to Cloud Storage constructs so that it can be accessed by remote workers.
The persistent state of a caching system consists of point-in-time snapshots, metadata/primarily DDL, and the transaction log. One stipulation, however, is that the caching system must do "blind updates." That is to say all mutations applied to the database must be appended to the log. Indexing schemes are then employed to manifest the current image of the database from within the local host's memory.
The RocksDB library provides the means of building such caching systems (Lomet/DaMoN18). For example, Facebook deploys MySQL configured to use the MyRocks storage engine, which internally uses the RocksDB library. RocksDB implements the Log Structured Merge Tree (LSM-Tree). This means all mutations are appended to the Write-Ahead Log (WAL) and then applied to in-memory mem-tables. Over time these mem-tables become immutable, are flushed to sstable files, and subsequently, the associated resources freed. Transforming MySQL w/MyRocks into a CNDB is therefore a function of mapping the WAL and sstable files on to the appropriate Cloud Storage constructs.
Enter Rockset's RocksDB-Cloud library, an extension of RocksDB that maps local sstable files on to an S3-style bucket and WAL entries on to a Cloud Native Log such as a Kafka or Kinesis Partition. Intel has been collaborating with several end-customers and the Rockset team to enable this deployment scenario. Thus far we've successfully enabled the database to operate with MariaDB configured to maintain its sstable files locally and in a Minio-based S3 bucket. The WAL is configured to be local. We rely on the MariaDB instances binlog for maintaining the current a database's change-state.
Rockset also uses RocksDB-Cloud as the foundation for its own cloud service. The Rockset service is a serverless search and analytics CNDB that indexes semi-structured documents using RocksDB-Cloud. RocksDB-Cloud is a great addition to the arsenal of data tools that the open-source community can leverage to build other CNDBs as well.
What is Intel's interest in enabling Next-Gen CNDBs? Intel's anticipated introduction of Optane DC Persistent Memory will afford the database ecosystem the opportunity to revisit trade-off orthodoxy that has held for generations. One trade-off that will remain, however, is the Five-Minute Rule (Appuswamy/ADMS@VLDB 2017). The cache system model of CNDBs is the embodiment of this trade-off. We therefore believe CNDBs provide the foundation for wide-spread adoption of Intel's Persistent Memory over time. Storage engines that use the RocksDB-Cloud library to incorporate Intel's Optane DC Persistent Memory into RocksDB's cache substrate is a big step in this direction.
References
(1) Barr, "New - Parallel Query for Amazon Aurora," 2018 https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/
(2) Lomet, "Cost Performance in Modern Data Stores: How Data Caching Systems Succeed," 2018 https://dl.acm.org/citation.cfm?id=3211927
(3) Wu et al, "Eliminating Boundaries in Cloud Storage with Anna," 2018 https://arxiv.org/abs/1809.00089
(4) Arpaci-Dusseau, "Cloud-Native File Systems," 2018 https://www.usenix.org/system/files/conference/hotcloud18/hotcloud18-paper-arpaci-dusseau.pdf
(5) Borthakur, "The birth of RocksDB-Cloud," 2017 http://rocksdb.blogspot.com/2017/05/the-birth-of-rocksdb-cloud.html
(6) Appuswamy et al, "The five-minute rule thirty years later," 2017 http://www.adms-conf.org/2017/camera-ready/5minute-rule.pdf