Indexing is commonly used to improve query performance when application speed is critical, and some of the most used large-scale systems today, like Google Search and Facebook News Feed, are built on indexing. When developers look to implement indexing, Elasticsearch is a well-known solution, primarily geared towards text search and logging use cases. However, Elasticsearch is complex to operate at scale.
In this tech talk, we discuss the requirements for real-time analytics and how indexing can be used in its implementation. We will compare and contrast indexing in Elasticsearch to Rockset’s cloud-based Converged Index and examine how these characteristics may impact how you build your applications.
- Operations at scale - Elasticsearch requires expertise and effort to deploy, configure and manage on an ongoing basis. Aside from standing up the initial cluster, scaling Elasticsearch involves the complexities of managing hardware, sharding and reindexing. Rockset is a serverless system, offered as a fully managed service, so all operations—scaling, index management, upgrades—are handled transparently to users.
- Data flexibility - Elasticsearch is optimized for search use cases but is less suitable for analytics across multiple data sets as it does not support joins. One alternative is to denormalize the data at ingest, but this does not scale beyond simple cases. Rockset ingests data without requiring a pre-defined schema and builds multiple indexes—search, column-based and row-based—on the data. Rockset allows developers to build their applications using full SQL functionality, including joins.
- Real-time ingest - Using Elasticsearch for real-time analytics requires building and maintaining ingestion pipelines to sync data from operational databases. As Elasticsearch documents are immutable, each updated document would have to be reindexed, consuming additional compute and I/O and impacting performance. Rockset, in contrast, has built-in connectors to common data sources. All documents are mutable and can be updated at the field level, allowing for efficient syncing to operational databases.