Serverless Real-time Indexing: A Low Ops Alternative to Elasticsearch
In this talk, we compare and contrast Elasticsearch and Rockset as indexing data stores for serving low latency queries.
Both Rockset and Elasticsearch are queryable datastores that can store data and serve queries. Both of them index data and use the index to serve queries. Both systems are document-sharded. But that is where their similarities end. Rockset is a serverless realtime indexing database built to exploit cloud elasticity with minimal ops, while Elastic requires special expertise and effort to manage the ELK stack.
Ben Hagan, a former solutions architect from Elasticsearch, and Shruti Bhat will go over some of the ops considerations for deploying and managing elastic clusters at scale, as we compare and contrast to Rockset’s serverless ops in the cloud
Collecting real-time events, managing change data capture and denormalization: With Elasticsearch, you manage different ingestion and input source pipelines to denormalize data. In contrast, Rockset supports click and connect integrations for continuously indexing data from MongoDB, DynamoDB, Kafka, S3 etc. It has native support for JOINs, so there is no need to denormalize your data.
Configuring clusters and managing node types: When deploying Elasticsearch, it is important to configure master nodes, data nodes, ingest nodes, coordinating nodes, alerting nodes in your cluster and optimize them based on the use case and requirements. In contrast, Rockset is a modern serverless system that is highly optimized for fast queries out-of-the-box
Scaling writes, sharding and re-indexing: Elasticsearch uses a primary-backup model for replication so each replica re-indexes the data locally again. As your data size grows, you may will typically increase the shard size and re-index your data in elasticsearch. In contrast, Rockset uses RocksDB remote compaction and micro-sharding to eliminate the need for re-indexing overhead.
Scaling reads and isolating workloads: The Elastic Cloud offers different types of nodes each with fixed compute/memory ratios such as io-optimized and storage-optimized nodes, and moving between these requires a data migration. In contrast, Rockset separates compute from storage to allow seamless scaling of reads by increasing the compute allocation in the form of fully isolated virtual compute for each workload.
Managing data durability and performance: Elasticsearch assumes a shared-nothing storage architecture where data durability is guaranteed via replication among data nodes, and you manually configure the resiliency of new writes. Rockset uses the cloud’s storage model with automatic S3-backed durable storage already configured in the cloud.
About the Speakers
Ben Hagan - Ben is a Solutions Architect specializing in real-time, big data and distributed systems with over 20 years industry experience. Prior to Rockset, Ben was a Principal Solutions Architect at Elastic working with Bay Area customers. Prior to Elastic, Ben built and led the Sales Engineering team at real-time social data startup, DataSift.
Shruti Bhat - Shruti is SVP Product at Rockset. Prior to Rockset, Shruti led Product Management for Oracle Cloud, with a focus on AI, IoT and Blockchain. Previously, Shruti was VP Marketing at Ravello Systems, where she drove the start-up's rapid growth from pre-launch to hundreds of customers and a successful acquisition. Prior to that, she was responsible for launching VMware's vSAN and has led engineering teams at HP and IBM.