Rockset vs. Apache Druid

A serverless alternative to Apache Druid for real-time analytics at scale

How Rockset Overcomes Challenges with Apache Druid

"Rockset is pure magic. We chose Rockset over Druid, because it requires no planning whatsoever in terms of indexes or scaling. In one hour, we were up and running, serving complex OLAP queries for our live leaderboards and dashboards at very high queries per second. As we grow in traffic, we can just ‘turn a knob’ and Rockset scales with us."

Yaron Levi

Chief Architect at Rumble Wellness

Challenge #1:
High operational burden

Druid

Druid clusters are complex distributed systems that involve many different processes and server types. These clusters have to be self-managed by the user (or even with the PaaS offering), the user still is responsible for configuring, scaling and capacity planning for the cluster. The lack of independent scaling of storage and compute makes ongoing administration and dealing with evolving workload demands an operational challenge.

Rockset

Rockset is a fully managed service, offered as a SaaS so all cluster operations are handled by Rockset. With Rockset, you do not need to manage indexes, clusters, or shards. Rockset allows independent scaling of hot storage and compute so that you can efficiently scale your applications over time and easily handle spikes in your workload.

Challenge #2:
Extensive performance engineering

Druid

Time-consuming manual configuration and tuning is required to get good query performance in Druid whenever new data or queries are introduced. Queries outside of this narrow set for which the system is tuned will not perform well.

Rockset

Rockset is designed to deliver fast queries out of the box, without requiring any tuning of the data set. Rockset’s Converged Index builds multiple indexes on every field of every document to accelerate a wide range of queries.

Challenge #3:
Joins are not well supported

Druid

Joins in Druid are measured to result in a 3x query performance overhead. Implementing joins is a signifcant amount of work, and Druid currently only supports broadcast joins so you cannot join two large datasets.

Rockset

Rockset supports a full featured SQL language including joins. A multi level aggregator executes Rockset’s join operator. Aggregators are distributed and the JOINs are executed in a parallel fashion over multiple aggregators for both scalability and speed. No data denormalization required.

Challenge #4:
Ingesting nested data

Druid

Druid requires flattening nested data at ingest and maintaining a flattening spec as the schema changes over time. This makes it challenging to manage real-time, constantly changing nested data in Druid.

Rockset

Rockset is designed to support constantly changing, nested data with schemaless ingest and the automatic generation of a schema based on the exact felds and types present. Run SQL queries over semi-structured data, nested objects and arrays, mixed types and null values without worrying about schema drift.

Why Rockset

  • Build Data Apps Faster

    • Fast queries out-of-the-box. No performance engineering required.
    • Built-in data connectors to always stay in sync with Kafka, MongoDB, DynamoDB, S3 and more.
    • Schemaless ingest and data is immediately queryable. No additional data modeling.
    Image for Build Data Apps Faster
  • Designed for Developers

    • Query Lambdas allow developers to turn SQL into reusable data APIs.
    • Specialized developer workflows with VS Code plugin and tight integration with CI/CD tools.
    • Write your application using Node.js, Java, Go or Python client libraries.
    Image for Designed for Developers
  • Minimize Ops Overhead

    • Fully managed, cloud-native service. No managing clusters, shards or indexes.
    • Autoscales storage and compute separately for better price-performance.
    • Optionally deploy in VPC for enterprise-grade security.
    Image for Minimize Ops Overhead
Real-Time Analytics At Lightning Speed

See Rockset in action