Creating and Restoring from Snapshots in Rockset

October 17, 2023

,

Data integrity is important and changes are often intimidating as they can disrupt data in unexpected ways. To make modifications less worrisome, Rockset now provides the ability to snapshot and restore collections. This will let users create a snapshot of a collection from which the collection can be restored in case the collection receives an unexpected modification.

Why Use Snapshots?

  1. Rollback in time

Data coming into Rockset goes through optional ingest transformations and indexing, operations that add overhead in terms of time and cost if you need to reingest that data for any reason. Rather than reingesting your collection, you can easily use snapshots to recover from bad writes, updates, deletes or other downstream changes that result in undesired behavior.

  1. Experiment and test with production data

By creating a new collection from a snapshot of a production collection, you can safely and quickly experiment on production data without impacting collections used in your applications, and thus accelerating the development of new features and functionality. The same benefits can apply to running tests that require real production data to validate that no breaking changes have been applied.

  1. Data audit

Data compliance and the ability to debug data changes over time can get really complex without the ability to compare your collections’ data across two distinct points in time. With snapshots, all you have to do is simply restore a copy of the collection from a desired point in time into a new collection and run SQL to compare across the master and restored versions.

How Snapshots Work

Snapshotting a collection will create a low-cost frozen copy of the collection which users can restore from later. Each snapshot does not create a copy of the data but rather only tracks the changes made to the collection from the last snapshot. This lets us keep the costs of snapshots low, enabling users to create snapshots more often.

Restoring from a snapshot will create a brand new collection with the exact same contents as the original collection at the time of the snapshot but with its own separate copy of all the data. Modifications made to the source collection will not affect the restored collection and vice versa. Once restored, users can then attach streaming sources to the restored collection to continue ingestion.

Creating a Snapshot

The Rockset Console has a Snapshots tab in the collections detail page with a Create Snapshot button. All existing snapshots are listed in this tab and new ones can be created with the Create Snapshot button.

snapshots-1

Clicking the button will trigger a Create Snapshot pane, where you can select the retention and description of the snapshot. Currently, we support up to seven days of snapshot retention.

snapshots-2

Clicking Create will trigger snapshot creation at that moment in time. This will add a new snapshot in the collection details page for this collection. The snapshot will initially have a Creating status, but should soon move to Created. Do note that snapshot contents might be up to ten minutes behind the current collection contents.

snapshots-3

Restoring From A Snapshot

Any non-expired Created snapshot can be restored from by clicking the extension button to the right of the snapshot and then clicking Restore.

snapshots-4

This will open a pane for selecting the restore options. A restored collection will have the same contents and settings as the original collection but with no sources attached. Filling out the options in this pane and clicking Restore will create a new collection from the snapshot. This new collection creation is not instantaneous and takes around 15 minutes per TB, but is faster than reingesting all the data. Once the restored collection moves from Initializing to Connected, it is ready to be queried.

snapshots-5

After a collection is restored, users can connect streaming sources to it. Snapshots do not save ingest state so it is up to the user to set up the streaming source to avoid repeated writes in case that is important.

For more information on how to Snapshot and Restore collections using the REST API please check out our Collection Snapshot and Restore guide.