• Loading Data
  • Adding a Data Source


This page documents how to use a MySQL table as a data source in Rockset.

These step-by-step instructions will allow you to migrate your data from your MySQL database table(s) into Rockset collection(s) by using AWS’s Data Migration Service (DMS) to export your data to a target AWS Kinesis stream. Those Rockset collection(s) will stay in sync with your MySQL databases table(s) as they are updated in real-time by reading MySQL CDC streams. This procedure includes:

  • Setting up your MySQL Server
  • Setting up a target AWS Kinesis Stream
  • Setting up an AWS DMS Replication Instance and Migration Task
  • Creating an Amazon Kinesis integration

These instructions will work with all of the following MySQL and MySQL-compatible databases:

  • On-Premise MySQL (versions 5.6, 5.7, and 8.0)
  • On-Premise MariaDB (versions 10.0.24 to 10.0.28, 10.1, 10.2, 10.3, 10.4, and 10.5)
  • Amazon RDS MySQL (versions 5.6, 5.7, and 8.0)
  • Amazon RDS MariaDB (versions 10.0.24 to 10.0.28, 10.1, 10.2, 10.3, 10.4, and 10.5)
  • Amazon Aurora MySQL (all versions)

Step 1: MySQL Server Configuration

Step 1.1 - Create a MySQL user to allow DMS to access your data

For DMS to export your data (both the initial export and when reading CDC streams), you will create a new MySQL user with read access to your MySQL data. Depending on what MySQL database you are using, you may have several options for completing this task – in this example, we will use the MySQL Command-Line Client, for a user named aws-dms:

mysql> CREATE USER 'aws-dms' IDENTIFIED BY 'password';
mysql> GRANT SELECT ON *.* TO 'aws-dms';
mysql> GRANT REPLICATION SLAVE ON *.* TO  'aws-dms';
mysql> GRANT REPLICATION CLIENT ON *.* TO  'aws-dms';

Step 1.2 - Configure your MySQL server to emit a CDC stream

The MySQL server emits binlogs which form the basis of the CDC stream. A MySQL server’s configuration is encapsulated in the form of a Parameter-Group. You have to first enable MySQL binlogs to be exposed in the row format by updating the binlog_format parameter, if not yet set, in the corresponding Parameter-Group. The value of this parameter should be set to ROW.

Use the AWS Console to access your MySQL database instance. With each database, you access the associated Parameter group details as shown below:


If your MySQL server is using the default parameter group, then you will not be able to change that specific parameter in that parameter group because the default parameter group is read-only. In that case, create a new parameter group and then update the binlog_format as shown below:


Step 1.3 - Enable Periodic Backups of MySQL

MySQL binlogs are enabled only if you enable the Periodic Backup feature of MySQL. Select the MySQL instance in the AWS Console and navigate to the Maintenance and Backups tab. If automatic backups are not enabled, modify the MySQL instance to enable the feature. After you are done, the Maintenance and Backups tab should look something like this:


Step 2: Create an AWS Kinesis Stream

The AWS Kinesis Stream is the destination that DMS uses as the target of a migration job. We will create an individual migration task for each MySQL table you wish to connect with Rockset. Use the AWS Console as shown below to create an AWS Kinesis Data Stream.

A separate Kinesis Data Stream is required for every database table you wish to connect with Rockset.

Check this out to create an AWS Kinesis Stream.


If you select the Provisioned data stream capacity mode, choose a shard count that aligns with the maximum ingest rate of your Virtual Instance size. See Virtual Instances for information on Virtual Instance ingest rates and sizing. AWS Kinesis Shard limits can be found here. If you are not able to estimate the number of shards to configure here, please contact your Rockset solutions architect or customer support.

If you select the On-demand data stream capacity mode, Rockset will dynamically adjust ingest rate to match the number of shards, up to your Virtual Instance ingest rate limit.

Step 3: Setup your AWS Data Migration (DMS)

AWS DMS setup is a multi-step process. First, you have to create one or more DMS Replication Instances. Each Replication Instance comes with a set of resources that power the data migration from MySQL to Kinesis. Then you will create a DMS Source Endpoint for each MySQL server. You will then create one DMS Destination Endpoint for each of the MySQL database table that you want to connect with Rockset. The Source Endpoint is used to configure the name of the MySQL server and the Destination Endpoint specifies the name of the corresponding Kinesis stream. Finally, you will configure a Database Migration Task that specifies the MySQL table names that you want to connect with Rockset. The creation of a Database Migration Task actually starts the migration of data from a specific MySQL table to the corresponding Kinesis stream.

Step 3.1 Create a DMS Replication Instance

AWS DMS uses a replication instance to connect to your source data store, read the source data, and format the data for consumption by the target data store. A replication instance also loads the data into the target data store. Most of this processing happens in memory; however, large transactions might require some buffering on disk. Cached transactions and log files are also written to disk. You can create a replication instance by using the AWS Console as follows:


The Instance class specifies the amount of resources provisioned for this replication instance. You can pick the Instance class based on the total amount of data to be migrated and how quickly you want the replication process to occur, although we recommend an Instance class of dms.c5.2xlarge or larger to ensure optimal performance during data ingest. While all versions of DMS Replication Instances should theoretically work for this process, we recommend using versions 3.4.2 or earlier as some recent versions may be occassionally unstable when tailing CDC streams. Name the replication instance appropriately, especially if you are creating multiple replication instances.

Step 3.2 Create a DMS Source Endpoint for your MySQL server

A Source Endpoint specifies the location and access information of your MySQL server. You use the AWS Console to create a DMS Source Endpoint as follows:

  • Pick the RDS instance name for the MySQL server.
  • Pick an appropriate name for this Endpoint. You can specify access parameters by clicking on Provide access information manually, and entering the name server name and port of the MySQL server. You can use the MySQL user aws-dms and password that you created in Step 1.1.


If you want multiple MySQL database tables in a single MySQL server to be connected to Rockset, you only need to create a single Source endpoint for that server. The same Source endpoint can then be used by multiple Migration Tasks in a later step. You can click on the Test endpoint connection (optional) tab to verify that the access configurations to access the MySQL server are configured correctly.

Step 3.3 Create a DMS Target Endpoint for every MySQL database table

The DMS Target Endpoint specifies the name of the Kinesis Stream that is the target of a migration task. Create a DMS Target Endpoint as follows:


Pick Amazon Kinesis as the Target Engine. Enter the ARN of the Kinesis Stream created in Step 2 in the Kinesis Stream ARN field. Additionally, enter a Service access role ARN of a role that has the correct PutRecord, PutRecords, and DescribeStream permissions to access Kinesis. If you do not have this role already created, please follow these instructions to create it.

You can click on the Test endpoint connection (optional) tab to verify that the access configurations to access the Kinesis Stream service are configured correctly.

Step 3.4 Create a DMS Database Migration Task to migrate a MySQL table to a Kinesis Stream

A Migration Task connects one specific MySQL database table on a specific MySQL server to a specific Kinesis Stream. Use the AWS Console to specify the following:


Choose the option for Migration type as Migrate existing data and replicate ongoing changes. This option does a full dump of the MySQL database table to Kinesis first, followed by continuous replication through MySQL CDC using the same Kinesis Stream. Specify the following task settings:


Configure these DMS configuration settings as required if you are working with a large table or if your update rate to your table is high.

"TargetMetadata": {
  "ParallelLoadThreads": 32,
  "ParallelLoadBufferSize": 1000,
  "BatchApplyEnabled": true,
  "ParallelApplyThreads": 32,
  "ParallelApplyBufferSize": 1000,
  "ParallelApplyQueuesPerThread": 128,
  "ParallelLoadQueuesPerThread": 128,
"FullLoadSettings": {
  "CommitRate": 50000,
  "MaxFullLoadSubTasks": 49,

The name of the MySQL table that you want to migrate is specified in the section titled Table Mappings. Note that if you are migrating data from a partitioned table in MySQL, you must specify the partition table name(s) in your selection rule rather than the parent table name. You must create a selection rule with the appropriate Table name field for each partition in the partitioned table.

Here is an exmaple where a table named customers in a database named rockset is being configured:


You can switch on Pre-migration assessment to verify that the settings are accurate. If you switch on the assessment, remember to manally start the task once the pre-migration assessment is complete.

After you have completed steps 1-3, you are ready to create the Rockset Kinesis integration.

Creating a Rockset Collection

  1. Click Collection from the left pane and select the MySQL tile under Managed Integration. This will guide you through a series of steps to follow to set up the Kinesis Intergration needed to create the collection.


  1. Populate the fields as shown below. Note that Format must be set to MYSQL. Also, it is mandatory to specify an immutable primary key from the source MySQL table during collection creation.


Ingesting Complex Types

The following is a table of custom SQL expressions that you can apply to fields as part of your collection's ingest transformation to ingest them with the desired type.

MySQL Data TypeSupported By DMSRockset Data TypeRockset Ingest Transformation SQL Expression
BINARYYbytesFROM_HEX(SUBSTR(binary_column, 3))
BIT(M)YbytesFROM_HEX(SUBSTR(bit_column, 3))
BLOBYbytesFROM_HEX(SUBSTR(blob_column, 3))
LONGBLOBYbytesFROM_HEX(SUBSTR(longblob_column, 3))
MEDIUMBLOBYbytesFROM_HEX(SUBSTR(mediumblob_column, 3))
TINYBLOBYbytesFROM_HEX(SUBSTR(tinyblob_column, 3))
DATEYdatePARSE_DATE('%Y-%m-%d', date_column)
DATETIMEYdatetimePARSE_DATETIME('%Y-%m-%dT%H:%M:%SZ', datetime_column)
TIMEYtimePARSE_TIME('%H:%M:%S', time_column)
TIMESTAMPYtimestampPARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ', timestamp_column)
VARBINARY(N)YbytesFROM_HEX(SUBSTR(varbinary_column, 3))
BOOLYboolCAST(bool_column as bool)

Unsupported Operations

Note: Table level operations such as drop table, truncate table, and alter table are currently unsupported.