How We Built Vector Search in the Cloud

Rockset fully integrates similarity indexing into its search and analytics database enabling engineers to scale AI applications to thousands of users.

In this talk, Chief Architect Tudor Bosman and engineer Daniel Latta-Lin share how they built a distributed similarity index using FAISS-IVF that is memory-efficient and supports immediate insertion and recall. They delve into the implementation details including how Rockset supports:

  • Real-time updates: Rockset supports inserts, updates and deletes of vectors and metadata. It’s built on RocksDB, an open-source embedded storage engine designed for mutability. When a vector is inserted or modified, Rockset computes its Voronoi cell using FAISS and then adds or updates the closest centroid and residual value to the search index. New data is reflected in searches in milliseconds.
  • Hybrid search with SQL: Rockset stores and indexes vectors alongside text, JSON and time series data. It leverages both the search index and the similarity index in parallel. Using FAISS, the K nearest centroids to the target vector are identified. Results are filtered by the K nearest centroids and metadata terms using the search index, a concept known as single-stage filtering.
  • Separation of indexing and search: With compute-compute separation, similarity indexing of vectors will not affect search performance. Ingestion and indexing happen on different virtual instances (clusters) than search for predictable performance as you scale.

Speakers

Tudor Bosman
Tudor Bosman leads architecture for Rockset's search and analytics database. Prior to Rockset, Tudor was an engineer at Facebook, where he spearheaded Unicorn, Facebook's search engine, and built infrastructure for the Facebook AI Research Lab and Facebook's applied machine learning initiative. Prior to Facebook, Tudor worked at Google on Gmail's storage and indexing backend, and at Oracle on database server internals. Tudor holds an MS in Computer Science from Stanford and a BS in Computer Science from Caltech.
Daniel Latta-Lin
Daniel Latta-Lin is a software engineer at Rockset who built similarity indexes. Daniel has also contributed to Rockset’s query execution engine, text search functionality and data integration to the Feast feature store. Prior to Rockset, Daniel was a member of the technical staff at Nutanix. Daniel holds a BA in Computer Science from the University of Michigan and is pursuing a MS in Computer Science from the University of Washington.

Recommended Resources