Weaviate vs Rockset

Compare and contrast Weaviate and Rockset by architecture, ingestion, queries, performance, and scalability.

Weaviate vs Rockset Ingestion

Ingestion

Weaviate

Rockset

Streaming and bulk ingestion

Weaviate recommends batching vector embeddings in sizes of 100-300. For large-size files, it recommends breaking them up and ingesting them using libraries like ijson for JSON files and pandas for CSV files. Some manual flushing of batches may be required.

Rockset supports streaming and bulk data ingestion. Bulk ingestion is used for an initial load of data into the system and uses temporary compute to process incoming data. Rockset supports streaming ingestion and can ingest and index high-velocity event and CDC streams within 1-2 seconds.

Index updates

Weaviate can update the values of an existing property or an entire object in a schema. Weaviate does use HNSW index type which is more costly when it comes to adding or updating vectors. Weaviate does not support adding or deleting properties to the schema.

Rockset supports in-place updates for vectors and metadata with Rockset's Converged Indexing technology built on mutable RocksDB.

Embedding generation

Weaviate supports API calls to OpenAI, Cohere and Huggingface to index embeddings.

Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more.

Size of vectors and metadata

The maximum number of vector dimensions for an embedding is 65,535.

Rockset has a document size of 40MB and supports a vector dimensionality of up to 200,000.

Versioning

There does not appear to be a way to version in Weaviate.

Rockset uses aliases for versioning with no downtime.

Weaviate supports batch insertion of vectors and updates and in-place updates for vectors and metadata. Weaviate supports searches across high dimensional vector embeddings.

Rockset is built for streaming data and is a mutable database, supporting in-place updates for vectors and metadata. As a real-time search and analytics database, Rockset supports searches across large-scale data, including vector embeddings and metadata.

Weaviate vs Rockset Indexing

Indexing

Weaviate

Rockset

KNN and ANN

Weaviate supports KNN and ANN search using HNSW.

Rockset supports KNN and ANN search. Rockset is built to be algorithm agnostic and currently build a distributed FAISS index for scalability. Rockset uses its cost-based optimizer to tradeoff between KNN and ANN search for greater efficiency. At query time, metadata on indexes is accessed to determine where the ANN index is stored for more efficient retrieval. This architecture avoids extensive memory overhead found in other solutions and limitations on metadata filtering.

Additional indexes

Weaviate has an inverted index that can be used for filters, hybrid search and BM25 search.

Rockset builds a Converged Index or a search, ANN, columnar and row index on the data for efficient retrieval.

Vectorization

Weaviate supports vectorization to speed up query execution.

All of Rockset's ANN and KNN indexes are vectorized.

Index management

Weaviate users are responsible for configuring and managing indexes and product quantization.

Rockset handles all index creation and management.

Weaviate supports KNN and ANN search using HNSW indexing algorithms. Weaviate provides inverted indexes and vector search indexes and uses vectorization to speed up query execution. Users are responsible for index maintenance.

Rockset supports KNN and ANN search using FAISS indexing algorithms. Rockset consolidates search, vector search, columnar and row indexes into a Converged Index to support a wide range of query patterns out of the box. Vectorization is used to speed up query execution

Weaviate vs Rockset Querying

Querying

Weaviate

Rockset

Metadata filtering

Weaviate supports metadata filtering and hybrid search. Weaviate pre-filters the data and only if a number of records returns (default- greater than 40,000) will it run an ANN search. Otherwise, it uses a brute force exact search.
Weaviate uses a strict schema system with all of the fields and their type specified before the data is indexed.

Rockset supports metadata filtering and hybrid search. Rockset's cost-based optimizer determines the most efficient path to query executing, either pre-filtering using metadata or applying the filter during the approximate kNN search.

Multi-modal models

Weaviate supports multi-modal modules with CLIP.

Rockset enables searches across multiple ANN fields to support multi-modal models.

API (SQL, REST, etc)

Weaviate has RESTful APIs for database management and CRUD operations and a GraphQL API for accessing data objects and search.

Rockset supports SQL and REST APIs. Rockset uses query lambdas to generate unique, parameterized API endpoints based on your SQL query.

Weaviate pre-filters data before an approximate kNN search. Weaviate supports a GraphQL API for search.

Rockset supports pre-filtering and applying a filter during an approximate kNN search. Rockset supports SQL and REST APIs. Rockset applies a filter during an approximate kNN search.

Weaviate vs Rockset Ecosystem

Ecosystem

Weaviate

Rockset

Integrations (Huggingface, Langchain, etc.)

Weaviate supports API calls to OpenAI, Cohere and HuggingFace to insert and index embeddings. Weaviate has an integration to Langchain and LlamaIndex.

Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more. Rockset has an integration to LangChain and LlamaIndex. Rockset also offers built-in connectors to event streaming platforms (Kafka, Kinesis, etc.), OLTP databases (MongoDB, DynamoDB, etc.) and data lakes (S3, GCS, etc.).

Weaviate vs Rockset Architecture

Architecture

Weaviate

Rockset

Cloud architecture

Weaviate was built for on-prem and has recently introduced a managed offering. Weaviate has a tightly coupled architecture where CPU, RAM and SSD scale together for ingestion and queries. Weaviate stores its object store and inverted index within the same shard; it places its vector index next to the object store. Users responsible for clusters, shards and indexes. Resharding is an expensive operation.

Built for the cloud. Indexing and queries can be run on isolated compute clusters (ie: Virtual Instances) for predictable performance at scale.

Scalability

Weaviate scales horizontally for ingestion and queries. Replicas to support high QPS use cases are still in development. Dynamically scaling a cluster is not fully supported- nodes cannot be removed if data is present. In this architecture, ingestion and queries use the same CPU and memory resources, there is no resource isolation, allowing for potential resource contention.

Rockset is the only search and analytics database with compute-storage and compute-compute separation. The bulk and streaming ingestion and indexing of vector embeddings is fully isolated from the compute and RAM used for query serving. This removes resource contention between the two workloads. Furthermore, Rockset separates hot storage from compute so you are not bound by the size of your vector embeddings in increasing the size of your cluster. Rockset can scale up and down on demand for better price-performance.

Enterprise readiness

Weaviate does not have case studies of enterprises using their product in production.

Rockset is used by enterprises at scale including Allianz, JetBlue and Whatnot.

Weaviate is built for on-prem with a tightly coupled architecture. Scaling Weaviate requires data and infrastructure expertise and management.

Rockset is built for the cloud and separates compute-storage and compute-compute. The compute used for ingestion and indexing of vector embeddings is isolates from the compute used for query serving. Rockset is used by enterprises including Allianz, JetBlue and Whatnot.