Elasticsearch vs Rockset

Compare and contrast Elasticsearch and Rockset by architecture, ingestion, queries, performance, and scalability.

Elasticsearch vs Rockset Ingestion

Ingestion

Elasticsearch

Rockset

Streaming and bulk ingestion

Elasticsearch supports updates and bulk ingestion. It recommends bulk ingesting into larger, fewer segments as each segment has an HNSW graph that needs to be searched. This results in higher latency.

Rockset supports streaming and bulk data ingestion. Bulk ingestion is used for an initial load of data into the system and uses temporary compute to process incoming data. Rockset supports streaming ingestion and can ingest and index high-velocity event and CDC streams within 1-2 seconds.

Index updates

Index updates happen through expensive merge operations and reindexing. It's recommended to avoid heavy indexing during approximate kNN search and to reindex the new documents into a separate index rather than update them in-place.

Rockset supports in-place updates for vectors and metadata with Rockset's Converged Indexing technology built on mutable RocksDB.

Embedding generation

Elasticsearch introduced its own Elastic Sparse Encoder model that can be used for embedding generation. To use a third party model with Elasticsearch, you must import and deploy the model, and then create an ingest pipeline with an inference processor to perform data transformation.

Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more.

Size of vectors and metadata

Elasticsearch supports vectors up to 2048 dimensions. With approximate KNN search, all vector data must fit in the node's page cache for it to be efficient.

Rockset has a document size of 40MB and supports a vector dimensionality of up to 200,000.

Versioning

Elasticsearch uses aliases to reindex data with no downtime.

Rockset uses aliases for versioning with no downtime.

Elasticsearch supports both streaming and bulk ingestion. It recommends using fewer Lucene segments and avoiding updates and reindexing to save on compute costs. Elasticsearch supports searches across large-scale data, including vector embeddings and metadata.

Rockset is built for streaming data and is a mutable database, supporting in-place updates for vectors and metadata. As a real-time search and analytics database, Rockset supports searches across large-scale data, including vector embeddings and metadata.

Elasticsearch vs Rockset Indexing

Indexing

Elasticsearch

Rockset

KNN and ANN

Elasticsearch supports KNN and ANN search. ANN search uses the HNSW algorithm. HNSW is a graph-based alorithm which only works efficiently when most vector data is held in memory.

Rockset supports KNN and ANN search. Rockset is built to be algorithm agnostic and currently build a distributed FAISS index for scalability. Rockset uses its cost-based optimizer to tradeoff between KNN and ANN search for greater efficiency. At query time, metadata on indexes is accessed to determine where the ANN index is stored for more efficient retrieval. This architecture avoids extensive memory overhead found in other solutions and limitations on metadata filtering.

Additional indexes

Elasticsearch includes inverted index for text search, BKD trees for geolocation search and ANN indexes.

Rockset builds a Converged Index or a search, ANN, columnar and row index on the data for efficient retrieval.

Vectorization

Elasticsearch added vectorization to its 8.9.0 version to speed up query execution.

All of Rockset's ANN and KNN indexes are vectorized.

Index management

Elasticsearch users are responsible for index maintenance including the number of index segments and the reindexing of data.

Rockset handles all index creation and management.

Elasticsearch supports KNN and ANN search using HNSW indexing algorithms. Elasticsearch provides inverted indexes and vector search indexes and uses vectorization to speed up query execution. Users are responsible for index maintenance.

Rockset supports KNN and ANN search using FAISS indexing algorithms. Rockset consolidates search, vector search, columnar and row indexes into a Converged Index to support a wide range of query patterns out of the box. Vectorization is used to speed up query execution

Elasticsearch vs Rockset Querying

Querying

Elasticsearch

Rockset

Metadata filtering

Elasticsearch supports metadata filtering and hybrid search. Elasticsearch filters are applied during the approximate kNN search.

Rockset supports metadata filtering and hybrid search. Rockset's cost-based optimizer determines the most efficient path to query executing, either pre-filtering using metadata or applying the filter during the approximate kNN search.

Multi-modal models

Elasticsearch enables searches across multiple kNN fields to support multi-modal models.

Rockset enables searches across multiple ANN fields to support multi-modal models.

API (SQL, REST, etc)

Elasticsearch exposes REST APIs that can be called directly to configure and access Elasticsearch features.

Rockset supports SQL and REST APIs. Rockset uses query lambdas to generate unique, parameterized API endpoints based on your SQL query.

Elasticsearch supports REST APIs.

Rockset supports pre-filtering and applying a filter during an approximate kNN search. Rockset supports SQL and REST APIs. Rockset applies a filter during an approximate kNN search.

Elasticsearch vs Rockset Ecosystem

Ecosystem

Elasticsearch

Rockset

Integrations (Huggingface, Langchain, etc.)

To use a third party model with Elasticsearch like Huggingface, you must import and deploy the model, and then create an ingest pipeline with an inference processor to perform data transformation. Elasticsearch has an integration with Langchain.

Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more. Rockset has an integration to LangChain and LlamaIndex. Rockset also offers built-in connectors to event streaming platforms (Kafka, Kinesis, etc.), OLTP databases (MongoDB, DynamoDB, etc.) and data lakes (S3, GCS, etc.).

Elasticsearch vs Rockset Architecture

Architecture

Elasticsearch

Rockset

Cloud architecture

Built for on-prem. Indexing and search are run on the same instances which has the potential to cause compute contention. Users responsible for clusters, shards and indexes.

Built for the cloud. Indexing and queries can be run on isolated compute clusters (ie: Virtual Instances) for predictable performance at scale.

Scalability

Elasticsearch requires deep expertise around servers, clusters, nodes, indexes and shards to operate at scale. For vector search on Elasticsearch, users may face scaling challenges given that indexing and search are run on the same instance, all vector data must fit into the page cache and each index segment has an HNSW graph that needs to be searched which constributes to latency.

Rockset is the only search and analytics database with compute-storage and compute-compute separation. The bulk and streaming ingestion and indexing of vector embeddings is fully isolated from the compute and RAM used for query serving. This removes resource contention between the two workloads. Furthermore, Rockset separates hot storage from compute so you are not bound by the size of your vector embeddings in increasing the size of your cluster. Rockset can scale up and down on demand for better price-performance.

Enterprise readiness

Elasticsearch is used by enterprises at scale including Booking.com and Cisco.

Rockset is used by enterprises at scale including Allianz, JetBlue and Whatnot.

Elasticsearch is built for on-prem with a tightly coupled architecture. Scaling Elasticsearch requires data and infrastructure expertise and management. Elasticsearch is used by enterprises including Booking.com and Cisco.

Rockset is built for the cloud and separates compute-storage and compute-compute. The compute used for ingestion and indexing of vector embeddings is isolates from the compute used for query serving. Rockset is used by enterprises including Allianz, JetBlue and Whatnot.