Serverless Real-Time Indexing: A Low Ops Alternative to Elasticsearch
We compare and contrast Elasticsearch and Rockset indexing strategies, relationship modeling and operational overhead for real-time search and analytics.
Both Rockset and Elasticsearch are queryable datastores that can store data and serve queries. Both of them index data and use the index to serve queries. Both systems are document-sharded. But that is where their similarities end. Rockset is a serverless realtime indexing database built to exploit cloud elasticity with minimal ops, while Elastic requires special expertise and effort to manage the ELK stack.
In this talk, we will discuss ops considerations for deploying and managing elastic clusters at scale, as we compare and contrast to Rockset’s serverless ops in the cloud:
- Collecting real-time events, managing change data capture and denormalization: With Elasticsearch, you manage different ingestion and input source pipelines to denormalize data. In contrast, Rockset supports click and connect integrations for continuously indexing data from MongoDB, DynamoDB, Kafka, S3 etc. It has native support for JOINs, so there is no need to denormalize your data.
- Configuring clusters and managing node types: When deploying Elasticsearch, it is important to configure master nodes, data nodes, ingest nodes, coordinating nodes, alerting nodes in your cluster and optimize them based on the use case and requirements. In contrast, Rockset is a modern serverless system that is highly optimized for fast queries out-of-the-box
- Scaling writes, sharding and re-indexing: Elasticsearch uses a primary-backup model for replication so each replica re-indexes the data locally again. As your data size grows, you may will typically increase the shard size and re-index your data in elasticsearch. In contrast, Rockset uses RocksDB remote compaction and micro-sharding to eliminate the need for re-indexing overhead.
- Scaling reads and isolating workloads: The Elastic Cloud offers different types of nodes each with fixed compute/memory ratios such as io-optimized and storage-optimized nodes, and moving between these requires a data migration. In contrast, Rockset separates compute from storage to allow seamless scaling of reads by increasing the compute allocation in the form of fully isolated virtual compute for each workload.
- Managing data durability and performance: Elasticsearch assumes a shared-nothing storage architecture where data durability is guaranteed via replication among data nodes, and you manually configure the resiliency of new writes. Rockset uses the cloud’s storage model with automatic S3-backed durable storage already configured in the cloud.
This tech talk is sponsored by Intel and Rockset. Rockset achieves 84% faster performance with Intel Xeon Scalable processors for real-time analytics in the cloud.
About the Speaker
Shruti Bhat - Shruti is SVP Product at Rockset. Prior to Rockset, Shruti led Product Management for Oracle Cloud, with a focus on AI, IoT and Blockchain. Previously, Shruti was VP Marketing at Ravello Systems, where she drove the start-up's rapid growth from pre-launch to hundreds of customers and a successful acquisition. Prior to that, she was responsible for launching VMware's vSAN and has led engineering teams at HP and IBM.