Creating and Restoring from Snapshots in Rockset
October 17, 2023
Data integrity is important and changes are often intimidating as they can disrupt data in unexpected ways. To make modifications less worrisome, Rockset now provides the ability to snapshot and restore collections. This will let users create a snapshot of a collection from which the collection can be restored in case the collection receives an unexpected modification.
Why Use Snapshots?
- Rollback in time
Data coming into Rockset goes through optional ingest transformations and indexing, operations that add overhead in terms of time and cost if you need to reingest that data for any reason. Rather than reingesting your collection, you can easily use snapshots to recover from bad writes, updates, deletes or other downstream changes that result in undesired behavior.
- Experiment and test with production data
By creating a new collection from a snapshot of a production collection, you can safely and quickly experiment on production data without impacting collections used in your applications, and thus accelerating the development of new features and functionality. The same benefits can apply to running tests that require real production data to validate that no breaking changes have been applied.
- Data audit
Data compliance and the ability to debug data changes over time can get really complex without the ability to compare your collections’ data across two distinct points in time. With snapshots, all you have to do is simply restore a copy of the collection from a desired point in time into a new collection and run SQL to compare across the master and restored versions.
How Snapshots Work
Snapshotting a collection will create a low-cost frozen copy of the collection which users can restore from later. Each snapshot does not create a copy of the data but rather only tracks the changes made to the collection from the last snapshot. This lets us keep the costs of snapshots low, enabling users to create snapshots more often.
Restoring from a snapshot will create a brand new collection with the exact same contents as the original collection at the time of the snapshot but with its own separate copy of all the data. Modifications made to the source collection will not affect the restored collection and vice versa. Once restored, users can then attach streaming sources to the restored collection to continue ingestion.
Creating a Snapshot
The Rockset Console has a Snapshots
tab in the collections detail page with a Create Snapshot
button. All existing snapshots are listed in this tab and new ones can be created with the Create Snapshot
button.
Clicking the button will trigger a Create Snapshot
pane, where you can select the retention and description of the snapshot. Currently, we support up to seven days of snapshot retention.
Clicking Create
will trigger snapshot creation at that moment in time. This will add a new snapshot in the collection details page for this collection. The snapshot will initially have a Creating
status, but should soon move to Created
. Do note that snapshot contents might be up to ten minutes behind the current collection contents.
Restoring From A Snapshot
Any non-expired Created
snapshot can be restored from by clicking the extension button to the right of the snapshot and then clicking Restore
.
This will open a pane for selecting the restore options. A restored collection will have the same contents and settings as the original collection but with no sources attached. Filling out the options in this pane and clicking Restore
will create a new collection from the snapshot. This new collection creation is not instantaneous and takes around 15 minutes per TB, but is faster than reingesting all the data. Once the restored collection moves from Initializing
to Connected
, it is ready to be queried.
After a collection is restored, users can connect streaming sources to it. Snapshots do not save ingest state so it is up to the user to set up the streaming source to avoid repeated writes in case that is important.
For more information on how to Snapshot and Restore collections using the REST API please check out our Collection Snapshot and Restore guide.