Powering Real-Time Analytics at Scale on MySQL and PostgreSQL

April 15, 2021

,
See Rockset
in action

Get a product tour with a Rockset engineer

Relational databases today are widely known to be suboptimal for supporting high-scale analytical use cases, and are all but certain to run into issues as your production data size and query volume grow. This has been by far one of the most well-known weaknesses of relational databases for much of the past decade, and has led to surges in popularity of several new classes of databases such as NoSQL and NewSQL – each with their own sets of tradeoffs and drawbacks. When users run into slow queries on their relational databases like MySQL or PostgreSQL, they are faced with several (often painful) options:

  1. Vertically scale the existing database by paying for more CPU resources
  2. Create direct read replica(s) and send the slow and costly queries to the replica(s), vertically scaling those read replicas as necessary
  3. Use a service like Debezium to read CDCs via Kafka streams, and then:

    • If you need low latency for application use cases, write to a sink like Rockset or Elasticsearch
    • If you can tolerate higher latency, such as in BI use cases, write to a warehouse like Snowflake or Redshift
  4. Give up on relational databases completely and jump on a more horizontally scalable option like NoSQL at the cost of SQL aggregations and joins, if your data and query complexity allows

Today, we’re announcing a new solution to delivering millisecond-latency queries for your MySQL and PostgreSQL databases at scale: using Rockset’s brand new MySQL and PostgresSQL integrations, you can now use Rockset to power real-time, complex analytical queries on your relational databases. With this integration, you can now architect data-powered microservices and products to query Rockset instead of the primary database directly. This can reduce load significantly on your primary OLTP databases, especially since Rockset can handle your heaviest analytical queries which would otherwise cost you significant resources and increased risk to your most sensitive services. On top of this, Rockset automatically indexes every single field in your table using Rockset’s Converged Index™ technology, and so you don’t have to design or define any indexes on your own.

Scale your relational databases with near-zero operational burden by taking your most expensive queries and offloading them from your primary database, with Rockset as a secondary index. Rockset replicates the data in real-time from your primary database, including both the initial full-copy data replication into Rockset and staying in sync by continuously reading your MySQL or PostgreSQL change streams. Rockset also has first-class query performance on a variety of complex queries and, most importantly, is horizontally scalable. Compute and storage are also separately scaled in Rockset, allowing you to cost-optimize for the desired performance of your choice.

Who Should Use It

The MySQL and PostgreSQL integrations with Rockset allow you to power real-time analytics at scale for your respective relational database. Using Rockset as an external index for your MySQL or PostgreSQL database is an ideal solution in the following instances:

  1. You’re trying to scale your MySQL/PostgreSQL database to deal with slow queries or resource constraints as your application grows
  2. You are building real-time data services or running analytics on MySQL/PostgreSQL that you want to offload without impacting load on your primary production database

How It Works

Real-time analytics on MySQL and Postgres

Steps:

  1. In your AWS account:

    • Create a new Kinesis stream to ingest your data into Rockset in real-time
    • Create a new DMS replication instance to export your MySQL/PostgreSQL database to the Kinesis stream
  2. In your Rockset account:

    • Create a MySQL/PostgreSQL integration by simply providing the newly created Kinesis stream name
    • Create a Rockset collection by specifying the MySQL/PostgreSQL table to be indexed in Rockset
    • Optionally apply ingest-time transformations such as type coercion, field masking or search tokenization
  3. Rockset will first do a fast bulk load of your existing data and then continuously tail your MySQL/PostgreSQL change streams to stay in sync with inserts, updates, and deletes

    • Execute fast, complex analytical queries at scale including JOINS with other databases or event streams
    • Send your most expensive analytics queries to Rockset and easily horizontally scale your compute resources
    • Optionally visualize your data using our integrations with dashboarding tools like Tableau, Retool, Redash, Superset and more

Rockset’s Converged Index™

Rockset is the real-time indexing database in the cloud, built by the team behind RocksDB. When connected to a source database—MySQL or PostgreSQL in this case—it builds an external index of the MySQL/PostgreSQL data.

How does Rockset help accelerate analytics and make analytics more efficient? Rockset powers millisecond-latency search, aggregations and joins on any data by automatically building a Converged Index, which combines the power of columnar, row, and inverted indexes. Rockset's Converged Index is the most efficient way to organize your data and enables queries to be available almost instantly and perform incredibly fast.

  1. While building a Converged Index requires more space on disk, the result is that complex queries are much faster and compute costs are much lower. In simple terms, we trade off storage for CPU. However, more importantly, we trade off hardware for human time. Humans no longer need to configure indexes or write custom client-side logic and humans no longer need to wait on slow queries.
  2. As any experienced database user knows, as you add more indexes, writes become heavier. A single document update now needs to update many indexes, causing many random database writes. In traditional storage based on B-trees, random writes to database translate to random writes on storage. At Rockset, we use LSM trees instead of B-trees. LSM trees are optimized for writes because they turn random writes to database into sequential writes on storage. We use RocksDB's LSM tree implementation and we have internally benchmarked hundreds of MB per second writes in a distributed setting.

Want to know how other industry leaders are using Rockset to power their applications? Check out our brand new case study with Command Alkon, a leading provider of cloud-based logistics software, to see how they used Rockset to overcome some of their biggest performance and scaling challenges to date.

Beta Partner Program

Sign up here to join our beta partner program for the MySQL/PostgreSQL integrations with Rockset. Our engineers will then personally reach out to you and guide you through the setup of this connector to ensure everything works well for you. Get a deep dive into how Rockset integrates with MySQL/PostgreSQL and share your feedback directly with our engineering team!