See Rockset
in action

Get a product tour with a Rockset engineer

Weaviate vs Rockset

Compare and contrast Weaviate and Rockset by architecture, ingestion, queries, performance, and scalability.

Weaviate vs Rockset Ingestion

Ingestion
Weaviate
Rockset
Streaming and bulk ingestion
Weaviate recommends batching vector embeddings in sizes of 100-300. For large-size files, it recommends breaking them up and ingesting them using libraries like ijson for JSON files and pandas for CSV files. Some manual flushing of batches may be required.
Rockset supports streaming and bulk data ingestion. Bulk ingestion is used for an initial load of data into the system and uses temporary compute to process incoming data. Rockset supports streaming ingestion and can ingest and index high-velocity event and CDC streams within 1-2 seconds.
Index updates
Weaviate can update the values of an existing property or an entire object in a schema. Weaviate does use HNSW index type which is more costly when it comes to adding or updating vectors. Weaviate does not support adding or deleting properties to the schema.
Rockset supports in-place updates for vectors and metadata with Rockset's Converged Indexing technology built on mutable RocksDB.
Embedding generation
Weaviate supports API calls to OpenAI, Cohere and Huggingface to index embeddings.
Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more.
Size of vectors and metadata
The maximum number of vector dimensions for an embedding is 65,535.
Rockset has a document size of 40MB and supports a vector dimensionality of up to 200,000.
Versioning
There does not appear to be a way to version in Weaviate.
Rockset uses aliases for versioning with no downtime.

Weaviate supports batch insertion of vectors and updates and in-place updates for vectors and metadata. Weaviate supports searches across high dimensional vector embeddings.

Rockset is built for streaming data and is a mutable database, supporting in-place updates for vectors and metadata. As a real-time search and analytics database, Rockset supports searches across large-scale data, including vector embeddings and metadata.


Weaviate vs Rockset Indexing

Indexing
Weaviate
Rockset
KNN and ANN
Weaviate supports KNN and ANN search using HNSW.
Rockset supports KNN and ANN search. Rockset is built to be algorithm agnostic and currently build a distributed FAISS index for scalability. Rockset uses its cost-based optimizer to tradeoff between KNN and ANN search for greater efficiency. At query time, metadata on indexes is accessed to determine where the ANN index is stored for more efficient retrieval. This architecture avoids extensive memory overhead found in other solutions and limitations on metadata filtering.
Additional indexes
Weaviate has an inverted index that can be used for filters, hybrid search and BM25 search.
Rockset builds a Converged Index or a search, ANN, columnar and row index on the data for efficient retrieval.
Vectorization
Weaviate supports vectorization to speed up query execution.
All of Rockset's ANN and KNN indexes are vectorized.
Index management
Weaviate users are responsible for configuring and managing indexes and product quantization.
Rockset handles all index creation and management.

Weaviate supports KNN and ANN search using HNSW indexing algorithms. Weaviate provides inverted indexes and vector search indexes and uses vectorization to speed up query execution. Users are responsible for index maintenance.

Rockset supports KNN and ANN search using FAISS indexing algorithms. Rockset consolidates search, vector search, columnar and row indexes into a Converged Index to support a wide range of query patterns out of the box. Vectorization is used to speed up query execution

See Rockset in action
Get a product tour with a Rockset engineer.

Weaviate vs Rockset Querying

Querying
Weaviate
Rockset
Metadata filtering
Weaviate supports metadata filtering and hybrid search. Weaviate pre-filters the data and only if a number of records returns (default- greater than 40,000) will it run an ANN search. Otherwise, it uses a brute force exact search.
Weaviate uses a strict schema system with all of the fields and their type specified before the data is indexed.
Rockset supports metadata filtering and hybrid search. Rockset's cost-based optimizer determines the most efficient path to query executing, either pre-filtering using metadata or applying the filter during the approximate kNN search.
Multi-modal models
Weaviate supports multi-modal modules with CLIP.
Rockset enables searches across multiple ANN fields to support multi-modal models.
API (SQL, REST, etc)
Weaviate has RESTful APIs for database management and CRUD operations and a GraphQL API for accessing data objects and search.
Rockset supports SQL and REST APIs. Rockset uses query lambdas to generate unique, parameterized API endpoints based on your SQL query.

Weaviate pre-filters data before an approximate kNN search. Weaviate supports a GraphQL API for search.

Rockset supports pre-filtering and applying a filter during an approximate kNN search. Rockset supports SQL and REST APIs. Rockset applies a filter during an approximate kNN search.


Weaviate vs Rockset Ecosystem

Ecosystem
Weaviate
Rockset
Integrations (Huggingface, Langchain, etc.)
Weaviate supports API calls to OpenAI, Cohere and HuggingFace to insert and index embeddings. Weaviate has an integration to Langchain and LlamaIndex.
Rockset can store and process embeddings generated from OpenAI, Cohere, HuggingFace and more. Rockset has an integration to LangChain and LlamaIndex. Rockset also offers built-in connectors to event streaming platforms (Kafka, Kinesis, etc.), OLTP databases (MongoDB, DynamoDB, etc.) and data lakes (S3, GCS, etc.).

Weaviate vs Rockset Architecture

Architecture
Weaviate
Rockset
Cloud architecture
Weaviate was built for on-prem and has recently introduced a managed offering. Weaviate has a tightly coupled architecture where CPU, RAM and SSD scale together for ingestion and queries. Weaviate stores its object store and inverted index within the same shard; it places its vector index next to the object store. Users responsible for clusters, shards and indexes. Resharding is an expensive operation.
Built for the cloud. Indexing and queries can be run on isolated compute clusters (ie: Virtual Instances) for predictable performance at scale.
Scalability
Weaviate scales horizontally for ingestion and queries. Replicas to support high QPS use cases are still in development. Dynamically scaling a cluster is not fully supported- nodes cannot be removed if data is present. In this architecture, ingestion and queries use the same CPU and memory resources, there is no resource isolation, allowing for potential resource contention.
Rockset is the only search and analytics database with compute-storage and compute-compute separation. The bulk and streaming ingestion and indexing of vector embeddings is fully isolated from the compute and RAM used for query serving. This removes resource contention between the two workloads. Furthermore, Rockset separates hot storage from compute so you are not bound by the size of your vector embeddings in increasing the size of your cluster. Rockset can scale up and down on demand for better price-performance.
Enterprise readiness
Weaviate does not have case studies of enterprises using their product in production.
Rockset is used by enterprises at scale including Allianz, JetBlue and Whatnot.

Weaviate is built for on-prem with a tightly coupled architecture. Scaling Weaviate requires data and infrastructure expertise and management.

Rockset is built for the cloud and separates compute-storage and compute-compute. The compute used for ingestion and indexing of vector embeddings is isolates from the compute used for query serving. Rockset is used by enterprises including Allianz, JetBlue and Whatnot.

See Rockset in action
Sub-second SQL on streaming data with surprising efficiency.