Apache Druid vs Rockset
Compare and contrast Apache Druid and Rockset by architecture, ingestion, queries, performance, and scalability.
Apache Druid vs Rockset Architecture
Druid’s architecture employs nodes called data servers that are used for both ingestion and queries. High ingestion or query load can cause CPU and memory contention compared with Druid alternatives. Breaking apart the pre-packaged ingestion and query server components involves planning ahead and additional complexity, and is not dynamic.
Rockset is built to be a cloud-only database and does not have a self-managed option. It disaggregates compute from both hot storage and cloud storage, allowing multiple isolated compute clusters to run on the same shared data.
Apache Druid vs Rockset Ingestion
Druid has built-in connectors that manage ingestion from common data sources. Unlike some Druid competitors, it doesn’t support nested data, so data must be flattened at ingest. Denormalization is also required at ingest, increasing operational burden for certain use cases.
Rockset has built-in connectors that manage streaming ingestion from common data sources. It has native support for semi-structured data, so that nested JSON and XML can be ingested and queried as is.
Apache Druid vs Rockset Performance
Druid is designed to make streaming data queryable as quickly as possible. JOINs are either impossible or incur a large performance penalty. Updates are only possible via batch jobs. Druid leverages data denormalization and write-time aggregation at ingestion to reduce query latency.
Rockset is designed to make streaming data queryable as quickly as possible by avoiding the need to batch data. It also updates documents efficiently by only reindexing fields that are part of an update request. Rockset indexes all data by default, which results in storage amplification but also enables low-latency queries that require less compute.
Apache Druid vs Rockset Queries
Druid has a native JSON-based query language and provides Druid SQL as an alternative that translates into its native queries. JOINs are not recommended.
Rockset supports SQL as its native query language and can perform SQL joins. Users can create data APIs by storing SQL queries in Rockset that are executed from dedicated REST endpoints. Rockset integrates with some common visualization tools, but BI is not Rockset’s primary use case.
Apache Druid vs Rockset Scalability
Druid users are exposed to complex decisions about the number and size of servers as clusters are scaled.
Rockset Virtual Instances are distributed compute clusters that can be scaled up for faster queries or scaled out for practically unlimited concurrency or if compute isolation is needed. Rockset has shared storage that scales automatically and independently, so no rebalancing is required.