Rockset
  • Loading Data
  • Adding a Data Source

MongoDB Atlas

This page covers how to use a MongoDB Atlas collection as a data source in Rockset. This includes:

  • Creating a MongoDB integration to securely connect collections in your MongoDB account with Rockset.
  • Create a collection which continuously syncs your data from a MongoDB collection into Rockset in real-time.

For the following steps, you must have access to a MongoDB Atlas account and be able to manage Custom Roles and Database Users within it. If you do not have access, please invite your MongoDB Atlas administrator to Rockset.

Create a MongoDB Atlas Integration

The steps below show how to set up a MongoDB Atlas integration using MongoDB SCRAM Authentication mechanism. An integration can provide access to one or more MongoDB collections across different databases in the same MongoDB Atlas cluster. You can use an integration to create Rockset collections that continuously sync data from your MongoDB collections.

Step 1: Configure MongoDB Atlas Custom Role

  1. Navigate to "Database Access" (from left navigation) in your MongoDB Atlas account for the cluster you want to connect to Rockset.
  2. Create a new custom role by navigating to "Custom Roles" and clicking "Add New Custom Role". If you already have a role set up for Rockset, you may update that existing role. For more details, refer to MongoDB Atlas documentation on Custom Roles.

MongoDB Custom Roles

  1. Set up read-only access to your MongoDB collection. Add the following Actions or roles: find, changeStream, collStats and also enter the names of databases as well as collections for each of these actions or roles. You can update access to databases and collections in Mongo UI at any time without changes required in Rockset integration. The same integration can be used to create more Rockset collections based on permissions.

MongoDB Actions or Roles

  1. Save the newly create or updated custom role and give it a descriptive name. You will attach this custom role to a new or existing Atlas user.

Why these permissions?

  • find - Required for initial collection scan when reading data.
  • changeStream - Required for retrieving records from MongoDB Atlas Change Streams.
  • collStats - Required for metadata about MongoDB Atlas collections.

Step 2: Configure MongoDB Atlas User

You'll need to create a MongoDB Atlas user to grant Rockset permissions to access your MongoDB resources.

  1. Navigate to "Database Access" (from left navigation) in MongoDB Atlas UI.
  2. Set up a new user by navigating to "Database Users" and clicking "Add New Database User". Note: If you already have a user for Rockset set up, you may re-use it or update the custom role directly. For more details, refer to MongoDB Atlas documentation on Database Users.

MongoDB Database Users

  1. Using SCRAM Password authentication enter a username and password for the database user and select the custom role created in Step 1 under "Database User Privileges".

MongoDB Add New Database User

  1. Finish by clicking "Add User" and record both username and password in the Rockset Console within a new MongoDB integration. Note that if you change the password later, you will need to drop and recreate the integration in Rockset.

Step 3: Access Connection String

You'll need to provide connection string for your MongoDB Atlas cluster for Rockset to connect to it.

  1. Navigate to the cluster (from left navigation) you want to connect to Rockset and click on "Connect".

MongoDB connect cluster

  1. Select "Connect your application" for connection method.

MongoDB connect application

  1. Copy the "Connection String" and record it in the Rockset Console for the integration. Also provide the name of the database that connections will use by default. Connection String looks like mongodb+srv://<username>:<password>@cluster0.mongodb.net/<dbname>. You don't need to replace username, password and dbname tags in the connection string.

MongoDB connection string

Step 4: Add Rockset IPs to IP Access List

To ensure connectivity with Atlas, you must allow the inbound network access from your application environment to MongoDB Atlas by adding Rockset's public IP addresses to your IP Access List. For more details, refer to MongoDB Atlas documentation on IP Access List Entries. This is the most secure and recommended way to allow Rockset to access your MongoDB cluster. Although, if you choose to skip adding Rockset IP address entries, make sure you select "Allow Access From Anywhere" which enables access to the cluster from anywhere.

  1. Navigate to "Network Access" (from left navigation) in MongoDB Atlas UI.
  2. Click on "Add IP Address" and create IP Access List entries for Rockset's public IP addresses.

MongoDB IP Access List Entry

Create a Collection

Once you create a collection backed by MongoDB Atlas, Rockset scans the MongoDB collections to continuously ingest and then subsequently uses the MongoDB Change Stream to update collections as new records are added to the MongoDB collection.

If your MongoDB collection is a capped collection, MongoDB change streams don't receive deletes for old documents and hence Rockset collection can go out of sync. For this we recommend setting retention on Rockset collection at time of creation.

You can create a collection from a MongoDB source in the Collections tab of the Rockset Console.

Create MongoDB Collection

Note that these operations can also be performed using any of the Rockset client libraries, the Rockset API, or the Rockset CLI.

How it works

When a MongoDB Atlas backed collection is created, indexing in Rockset occurs in two stages:

  1. A one-time full scan of the MongoDB collection in which all records are indexed and stored in the Rockset collection.
  2. Following that, continuous monitoring and sync of changes from the MongoDB collection (inserts, deletes and updates) to the Rockset collection in real-time using MongoDB Change Streams.

Once a MongoDB backed collection is set up, it will be a replica of the MongoDB collection, up-to-date to within a few seconds.

MongoDB Best Practices

When the MongoDB Atlas database is under heavy load, it affects the speed at which we can read updates. Below are some best practices for connecting MongoDB as a source with Rockset:

  • Start bulk ingest when your MongoDB database is under light load
    • This allows Rockset to do the one-time full scan of MongoDB without any read throttling
  • Increase the op-log size
    • See MongoDB recommendation for workloads that might require a larger oplog size
    • If the source MongoDB collection has a high write and update rate of operations, it is recommended to increase the op-log size.
      • MongoDB recommends that the oplog size for a cluster should be enough to facilitate a 24 hour Replication Oplog Window. For example, if you are generating 1 GB oplog/hour on average, then the recommendation is that your oplog is 24 GB.
    • Setup alerts on MongoDB project to trigger if the op-log churn (GB / Hour) exceeds a specified threshold.
  • Increase the read-throughput on the MongoDB cluster
    • Use common techniques to increase read performance for the initial scan. See some recommended techniques in a blog from our solution engineering team
    • Prefer using read replica to connect as a source with Rockset. Refer to this MongoDB doc for details on how you can setup a connection string with a readPreference flag
    • Rockset uses majority read concern. Read concern "majority" guarantees that the data read has been acknowledged by a majority of the replica set members (i.e. the documents read are durable and guaranteed not to roll back).
      • Make sure that majority read concern is enabled by following the instructions in this link.
      • Majority read concern is also a requirement for Change Streams in MongoDB 4.0 and earlier.
  • Monitor streaming ingest metrics in Rockset
    • If your org’s virtual instance is nearing peak streaming ingest rate consider increasing its size to avoid an increase in data latency and slow queries
      • Once the streaming ingest rate is reduced you can decrease the virtual instance size back for cost control
      • Using the metrics endpoint you can set alerts with your preferred monitoring tool
    • If the ingest keeps getting rate limited for a prolonged period of time, depending on your oplog size and churn rate, Rockset might not be able to catch up with all the updates coming from MongoDB, and the collection will enter an unrecoverable error state that will require re-creating it.