Adding an index to a database is one of those little joys in life. A query takes 10 seconds, you add a good index, and boom...10 milliseconds! Customers are happy, manager is happy, database is happy (according to its CPU graph at least). However, managing indexes gets old quickly. More indexes means writes are slower. There is always another query creeping up on the latency graph. Imagine the sum total of human time spent playing whack-a-mole with database indexes. Even worse, imagine how much of our daily interaction with technology is impacted by slow, unindexed queries.
Rockset is approaching this problem with a radical solution: build indexes on all columns. One of the design goals of Rockset is to absolutely minimize the amount of configuration the user needs to do. Creating indexes is a configuration; it has to go. We call our approach Converged Indexing. Before we dive into the technical details, let me share some background on two types of indexing we build upon: columnar indexing and search indexing.
In the beginning, there was row-oriented storage, where a single row is stored contiguously on the storage media. Fetching a single row is fast — a single IO. However, in some cases a database table might contain a huge number of columns, while a query only touches a small subset. For those kind of queries, column-oriented storage works better.
In column-oriented storage, we store all values for a particular column contiguously on storage. A query can efficiently fetch exactly the columns that it needs, which makes it great for analytical queries over wide datasets. Additionally, column-oriented storage has better compression ratios. Values within one column are usually similar to each other, and similar values compress really well when stored together. There are some advanced techniques that make compression even better, like dictionary compression or run-length encoding. It should be no surprise that column-oriented storage is used by some of the most successful data warehousing solutions, such as Vertica, Amazon Redshift, or Google's BigQuery.
Search indexing is a technique that makes search-like queries fast. In search indexing for each
(column, value) pair, we store the list of documents for which
column = value, called posting lists. Any query with a simple predicate can quickly fetch a list of documents satisfying that predicate. By keeping the posting lists sorted, we can intersect the lists or merge them to satisfy conjunction or disjunction of predicates, respectively. Search indexing is used in systems like Elasticsearch and Apache Solr, both based on the Apache Lucene library.
Converged Indexing: Row + Column + Search
At Rockset, we store every column of every document in a row-based store, column-based store, AND a search index.
Yes, we make three copies of the dataset. How is that sane? Two main reasons:
- Converged indexing requires more space on disk, but our queries are faster. In simple terms, we trade off storage for CPU. However, more importantly, we trade off hardware for human time. Humans no longer need to configure indexes, and humans no longer need to wait on slow queries.
- As any experienced database user knows, as you add more indexes, writes become heavier. A single document update now needs to update many indexes, causing many random database writes. In traditional storage based on B-trees, random writes to database translate to random writes on storage. At Rockset, we use LSM trees instead of B-trees. LSM trees are optimized for writes because they turn random writes to database into sequential writes on storage. You can learn more in this great article: Algorithms Behind Modern Storage Systems. We use RocksDB's LSM tree implementation and we have internally benchmarked hundreds of MB per second writes in a distributed setting.
We have all these indexes, but how do we pick the best one for our query? We built a custom SQL query optimizer that analyzes every query and decides on the execution plan. For example, consider the following queries:
SELECT * FROM search_logs WHERE keyword = ‘rockset’ AND locale = ‘en’
The optimizer will use the database statistics to determine that query needs to fetch a tiny fraction of the database. It will decide to answer the query with the search index.
SELECT keyword, count(*) c FROM search_logs GROUP BY keyword ORDER BY c DESC
There are no filters in this query; the optimizer will choose to use the column store. Because the column store keeps columns separate, this query only needs to scan values for column
keyword, yielding a much faster performance than a traditional row store.
With converged indexing, it is especially satisfying to see delighted customers, who are not used to fast queries out of the box with zero configuration. However, our work is not done. We continue to improve our indexing and query performance, and have some exciting ideas on using custom compression for both columnar store and search indexing. If you are curious about Rockset's performance on your workload, you can sign up for a free Rockset account here. We are also hiring.
P.S. If you want to learn more about how we built converged indexing, check out our presentation from Strata San Francisco 2019: