- Loading Data
- Adding a Data Source
Adding a Data Source
In order to add data to Rockset, you will need to:
- Add a new data source so that Rockset has the permissions and settings to connect to your data.
- Create a collection with the data. Collections are like tables in databases and are covered in the next section.
Rockset allows users to connect:
- data streams (Kafka, Kinesis)
- OLTP databases (DynamoDB, MongoDB, MySQL, PostgreSQL)
- data lakes (S3, GCS)
As new data shows up in your data source, it will get indexed within seconds into Rockset. Rockset ingests your data without needing a schema ahead of time, so you can get set up quickly.
Fully-managed integrations for the following data sources are currently supported, meaning that Rockset will automatically detect changes to your data source in real-time, replicating those changes into Rockset within seconds:
- Amazon DynamoDB
- Amazon Kinesis
- Amazon S3
- Amazon MSK
- Apache Kafka
- Azure Blob Storage
- Azure Event Hubs
- Google Cloud Storage
- Microsoft SQL Server
- MongoDB Atlas
Partially-managed integrations for the following data sources are currently supported, meaning that you must periodically export changes to your data source in batch to a fully-managed data source (such as Amazon S3), where Rockset will then replicate those changes within seconds:
Note: Using an integration is optional. If you prefer to insert and sync your data manually, or if your desired data source is not currently supported, you can always use the Rockset API to create and update your collections. There is more information about using the Rockset API to create self-managed data sources.
Integrations can be created by admins in your Rockset organization. They are created by using the Rockset Console or by using the Rockset API directly. Setup time generally takes around 10-15 minutes. Step-by-step instructions for each integration can be found under the documentation for each data source.
You can read about the permissions Rockset requires and why Rockset requires them for each integration type in the Data Sources section. You can also read about these permissions in the Rockset Console during integration creation.
Since many integrations require advanced permissions and multi-step processes, we generally recommend setting these up in the Rockset Console for full context.
Once an integration is set up, it can be used to create any number of collections. For each integration, you can see a list of each collections backed by that integration in the Rockset Console.
We generally recommend mapping each data source (such as MongoDB collection, DynamoDB table, Kafka
topic) to a single collection, and joining those collections at query time using
Additional Syncing Costs
Depending on the data source, additional costs may potentially be incurred by your data source provider from frequent read requests sent by Rockset to keep your data current in real time, such as AWS charging you for DynamoDB stream read requests. This cost generally remains very small--no more than a few US dollars (USD) per month--and does not grow exponentially even as your data size scales.
Export Sample Data
Rockset Public IPs
If your data source is configured with a network access policy, you may need to whitelist Rockset's public IP addresses for your region.
If you have any questions, please contact Support.