Storage Architecture

Rockset uses RocksDB, an open source key-value store, to store your data. RocksDB is widely used in many storage systems that require high performance and low latency access. It has become the storage engine of choice for many database management systems, including MySQL, Apache Kafka and CockroachDB.

Due to Rockset's use of a cloud-native auto-scaling architecture, index management and cluster provisioning is automated. This significantly minimizes operational overhead of provisioning capacity or managing servers.

The size of your data (after its been compressed and indexed in RocksDB) is the value used for usage tracking and billing purposes.

Understanding Storage Size

Storage for Converged Indexing

The total size of your data in Rockset is determined by the cumulative size of the indices Rockset builds on top of your data. Rockset’s Converged Indexing technology indexes each document and stores the indices as a set of key-value pairs inside a RocksDB storage engine. This format is optimized for fast query serving. The Converged Indexing process indexes every field of your document, including nested objects and array entries. Every field is indexed at least three ways:

  • An inverted index useful for point lookups
  • A columnar index useful for aggregations
  • A row index useful for data retrieval
  • A range index useful for range scans that have low selectivity

You also have the option of creating specialized geo-indexes by configuring Rockset’s Ingest Transformation.

The advantage to building all these indices is obvious: it makes all of your queries extremely fast. And, because they are automatically created for you, you do not have to manually create and tune these indices.

Rockset stores a type with every individual value of a field unlike other systems that typically associate a type with every field. The Converged Indexing format indexes each field type of your dataset. The same field in different records can contain data of different types and your queries can use TYPEOF function to filter values of a certain type. By storing the type with each value, you can avoid unnecessary data cleaning and filtering prior to ingesting data into Rockset.

So, what is the downside of creating these indices and storing individual types? One common perception is that every new index that you create will bloat your storage size. That perception is no longer true when storing your data in RocksDB:

Data Compaction in RocksDB

Given that each document field value is stored in several indexes, you might expect to see a significant size amplification between your data natively and your data in Rockset. But this is not often the case. RocksDB takes a set of key-values in a BlockBasedTableFormat and compresses it before storing it on disk. RocksDB allows storing different portions of the data to be stored using different compression technologies, e.g. frequently used data can be compressed using lz4 while less-frequently used data can be compressed using zstd or gzip. Also, RocksDB does delta-encoding of keys, so that keys that have common prefixes do not incur the cost of duplicating those key-prefixes on storage. Another way that RocksDB reduces storage bloat is by supporting bloom-filters on prefixes of keys rather than storing a bloom filter for every key.

The inverted index is organized as key-values of posting lists and Rockset uses prefix-compression and Elias-Fano encoding to reduce the size of these lists. The columnar index is chunked into groups and a group stores the values of a single field in sorted order. This columnar grouping reduces storage size by using delta encoding of values.

FAQs

Do datasets with sparse fields increase the size of my index?

Rockset is geared to support sparse fields without any storage size amplification. In other database systems, an index on a dataset which has hundreds of sparse fields may inflate the total size of the database because those indices have to record some metadata about each of those fields even if many of those fields do not exist on a specific record. Rockset’s data format is designed to not incur even a single byte overhead for any of these sparse fields.

Why is my data in Rockset larger in size than storing it in Parquet?

The RocksDB data format is optimized for indexing whereas Parquet and other warehousing technologies are optimized for scanning. If you want low latency query on a large dataset, the RocksDB format can find you the relevant records very quickly because it uses the indices that narrow down the search space. On the other hand, if you store data in Parquet, your query processing software will have to scan large sections of your dataset to find your matching records. So, when your data set size grows and you are storing data in Parquet, you either have to use a lot of compute to scan the data set in parallel for every query or you have to live with slower queries. Parquet is a readonly data format, which means that records cannot be updated in place. This fits nicely in a warehousing use-case, where most of your data is readonly and new data gets stored in new partitions on your storage. This allows the Parquet format to use a very compact data format. On the other hand, the RocksDB data format is a mutable data format, which means that you can update, overwrite or delete individual fields in a record at will.

In short, if you want interactive queries on large datasets, then the RocksDB-based data format is your optimal choice.

Can I reduce my Rockset bytes by doing Life Cycle Management of data?

If you are interested in reducing the size of your dataset, a good option is to set a retention duration for your data. Data that falls outside the retention window is automatically removed.

How can I reduce my total Rockset storage size?

Ingest Transformations can be used to filter the number of entries (rows) stored and the number of fields (columns) stored in Rockset. You can drop specific fields from being indexed, extract only the necessary information from each entry, omit entries entirely that do not meet a specific criteria, and/or aggregate data to a single entry to avoid unnecessary storage costs. Both whitelisting or blacklisting of fields are supported via FieldMappings.

Setting a retention duration for your datasets is another method for minimizing storage size. Data that falls outside the retention window is automatically removed.

Why is my data size in Rockset fluctuating? Am I losing data?

Rockset is designed to reduce the data latency of your data. If you write data to Rockset, it is visible to queries within a few seconds. If these writes were updates to existing records in the storage, then RocksDB’s background compaction algorithms merges the different versions of a field and retains only the most recent version of the value of any field. The reason the merging happens asynchronously in the background is because the system is designed to make newly written data visible to queries even before it is merged with existing records. The upside of this approach is that it reduces the data-latency of the system. A minor downside is that you may see your data size grow and shrink lazily if you have bursty update patterns. This downside is minimized with RocksDB’s leveled compaction which keeps the data size fluctuations within a margin of 10%.

What is my data size when bulk-loading data?

When you create a collection from a data source that has a non-trivial amount of data, the Rockset system employs a bulk-load mechanism. The bulk-load mechanism is optimized to load the data set in minimum time, and during this time the RocksDB Bytes could be much larger than your expected data size. Rest assured that you are not billed for storage for the duration when the collection is in BulkMode.

How many replicas are used for the stored data?

Rockset stores all data in S3, so we get the 99.999999999% durabilities of S3 for all data in Rockset. Rockset internally caches data in SSD and RAM, and multiple copies may be made as part of serving queries.

What is the backup mechanism for the data?

All data in Rockset is continuously backed up into S3 in real-time. Rockset eliminates the need for users to manually manage storage capacity and backup schedules.


πŸ“˜

Want to learn more?

Check out our blog posts on How We Use RocksDB at Rockset and Remote Compactions in RocksDB-Cloud.