Rockset vs Weaviate

Compare and contrast Rockset and Weaviate by architecture, ingestion, queries, performance, and scalability.

Rockset vs Weaviate Ingestion

Ingestion

Rockset

Weaviate

Streaming and bulk ingestion

Rockset supports streaming and bulk data ingestion. Bulk ingestion is used for an initial load of data into the system and uses temporary compute to process incoming data. Rockset supports streaming ingestion and can ingest and index high-velocity event and CDC streams within 1-2 seconds.

Weaviate recommends batching vector embeddings in sizes of 100-300. For large-size files, it recommends breaking them up and ingesting them using libraries like ijson for JSON files and pandas for CSV files. Some manual flushing of batches may be required.

Index updates

Rockset supports in-place updates for vectors and metadata with Rockset's Converged Indexing technology built on mutable RocksDB.

Weaviate can update the values of an existing property or an entire object in a schema. Weaviate does use HNSW index type which is more costly when it comes to adding or updating vectors. Weaviate does not support adding or deleting properties to the schema.

Embedding generation

Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more.

Weaviate supports API calls to OpenAI, Cohere and Huggingface to index embeddings.

Size of vectors and metadata

Rockset has a document size of 40MB and supports a vector dimensionality of up to 200,000.

The maximum number of vector dimensions for an embedding is 65,535.

Versioning

Rockset uses aliases for versioning with no downtime.

There does not appear to be a way to version in Weaviate.

Rockset is built for streaming data and is a mutable database, supporting in-place updates for vectors and metadata. As a real-time search and analytics database, Rockset supports searches across large-scale data, including vector embeddings and metadata.

Weaviate supports batch insertion of vectors and updates and in-place updates for vectors and metadata. Weaviate supports searches across high dimensional vector embeddings.

Rockset vs Weaviate Indexing

Indexing

Rockset

Weaviate

KNN and ANN

Rockset supports KNN and ANN search. Rockset is built to be algorithm agnostic and currently build a distributed FAISS index for scalability. Rockset uses its cost-based optimizer to tradeoff between KNN and ANN search for greater efficiency. At query time, metadata on indexes is accessed to determine where the ANN index is stored for more efficient retrieval. This architecture avoids extensive memory overhead found in other solutions and limitations on metadata filtering.

Weaviate supports KNN and ANN search using HNSW.

Additional indexes

Rockset builds a Converged Index or a search, ANN, columnar and row index on the data for efficient retrieval.

Weaviate has an inverted index that can be used for filters, hybrid search and BM25 search.

Vectorization

All of Rockset's ANN and KNN indexes are vectorized.

Weaviate supports vectorization to speed up query execution.

Index management

Rockset handles all index creation and management.

Weaviate users are responsible for configuring and managing indexes and product quantization.

Rockset supports KNN and ANN search using FAISS indexing algorithms. Rockset consolidates search, vector search, columnar and row indexes into a Converged Index to support a wide range of query patterns out of the box. Vectorization is used to speed up query execution

Weaviate supports KNN and ANN search using HNSW indexing algorithms. Weaviate provides inverted indexes and vector search indexes and uses vectorization to speed up query execution. Users are responsible for index maintenance.

Rockset vs Weaviate Querying

Querying

Rockset

Weaviate

Metadata filtering

Rockset supports metadata filtering and hybrid search. Rockset's cost-based optimizer determines the most efficient path to query executing, either pre-filtering using metadata or applying the filter during the approximate kNN search.

Weaviate supports metadata filtering and hybrid search. Weaviate pre-filters the data and only if a number of records returns (default- greater than 40,000) will it run an ANN search. Otherwise, it uses a brute force exact search.
Weaviate uses a strict schema system with all of the fields and their type specified before the data is indexed.

Multi-modal models

Rockset enables searches across multiple ANN fields to support multi-modal models.

Weaviate supports multi-modal modules with CLIP.

API (SQL, REST, etc)

Rockset supports SQL and REST APIs. Rockset uses query lambdas to generate unique, parameterized API endpoints based on your SQL query.

Weaviate has RESTful APIs for database management and CRUD operations and a GraphQL API for accessing data objects and search.

Rockset supports pre-filtering and applying a filter during an approximate kNN search. Rockset supports SQL and REST APIs. Rockset applies a filter during an approximate kNN search.

Weaviate pre-filters data before an approximate kNN search. Weaviate supports a GraphQL API for search.

Rockset vs Weaviate Ecosystem

Ecosystem

Rockset

Weaviate

Integrations (Huggingface, Langchain, etc.)

Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more. Rockset has an integration to LangChain and LlamaIndex. Rockset also offers built-in connectors to event streaming platforms (Kafka, Kinesis, etc.), OLTP databases (MongoDB, DynamoDB, etc.) and data lakes (S3, GCS, etc.).

Weaviate supports API calls to OpenAI, Cohere and HuggingFace to insert and index embeddings. Weaviate has an integration to Langchain and LlamaIndex.

Rockset vs Weaviate Architecture

Architecture

Rockset

Weaviate

Cloud architecture

Built for the cloud. Indexing and queries can be run on isolated compute clusters (ie: Virtual Instances) for predictable performance at scale.

Weaviate was built for on-prem and has recently introduced a managed offering. Weaviate has a tightly coupled architecture where CPU, RAM and SSD scale together for ingestion and queries. Weaviate stores its object store and inverted index within the same shard; it places its vector index next to the object store. Users responsible for clusters, shards and indexes. Resharding is an expensive operation.

Scalability

Rockset is the only search and analytics database with compute-storage and compute-compute separation. The bulk and streaming ingestion and indexing of vector embeddings is fully isolated from the compute and RAM used for query serving. This removes resource contention between the two workloads. Furthermore, Rockset separates hot storage from compute so you are not bound by the size of your vector embeddings in increasing the size of your cluster. Rockset can scale up and down on demand for better price-performance.

Weaviate scales horizontally for ingestion and queries. Replicas to support high QPS use cases are still in development. Dynamically scaling a cluster is not fully supported- nodes cannot be removed if data is present. In this architecture, ingestion and queries use the same CPU and memory resources, there is no resource isolation, allowing for potential resource contention.

Enterprise readiness

Rockset is used by enterprises at scale including Allianz, JetBlue and Whatnot.

Weaviate does not have case studies of enterprises using their product in production.

Rockset is built for the cloud and separates compute-storage and compute-compute. The compute used for ingestion and indexing of vector embeddings is isolates from the compute used for query serving. Rockset is used by enterprises including Allianz, JetBlue and Whatnot.

Weaviate is built for on-prem with a tightly coupled architecture. Scaling Weaviate requires data and infrastructure expertise and management.