Redefining Search and Analytics for the AI Era
August 29, 2023
We founded Rockset to empower everyone from Fortune 500 to a five-person startup to build powerful search and AI applications and scale them efficiently in the cloud. Our team is on a mission to bring the power of search and AI to every digital disruptor in the world. Today, we are thrilled to announce a major milestone in our journey towards redefining search and analytics for the AI era. We’ve raised $44M in a new round led by Icon Ventures, along with investments from new investors Glynn Capital, Four Rivers, K5 Global, and also our existing investors Sequoia and Greylock participating. This brings our total capital raised to $105M and we are excited to enter our next phase of growth.
Lessons learned from @scale deployments
I managed and scaled Facebook's online data infrastructure from 2007, when it had 30-40 million MAUs, to 2015 when it had 1.5 billion MAUs. In the early days, Facebook’s original Newsfeed ran in batch mode with basic statistical models for ranking, and it was refreshed once every 24 hours. During my time, Facebook's engagement skyrocketed as Newsfeed became the world’s most popular recommendation engine powered by advanced AI & ML algorithms and a powerful distributed search and analytics backend. My team helped create similar transitions from powering the Like button, to serving personalized Ads to fighting spam and more. All of this was enabled by the infrastructure we built. Our CTO Dhruba Borthakur created RocksDB, our chief architect Tudor Bosman founded the Unicorn project that powers all search at Facebook, as well as built infrastructure for Facebook AI Research Lab, and I built and scaled TAO that powers Facebook’s social graph. I saw first-hand the transformative power of having the right data stack.
Thousands of enterprises started tinkering with AI when ChatGPT showed the world the art of the possible. As enterprises take their successful ideas to production it is imperative that they think through three important factors:
- How to handle real-time updates. Streaming first architectures are a necessary foundation for the AI era. Think of a dating app that is much more efficient because it can incorporate signals regarding who is currently online or within a certain geographic radius of you, for example. Or an airline chatbot that gives relevant answers when it has the latest weather and flight updates.
- How to onboard more developers fast and increase development speed. Developments in AI are happening at light speed. If your team is stuck managing pipelines and infrastructure instead of iterating on your applications quickly, it will be impossible to keep up with emerging trends.
- How to make these AI apps efficient at scale in order to get a positive ROI. AI applications can get very expensive very quickly. The ability to scale apps efficiently in the cloud is what is going to allow enterprises to continue to leverage AI.
What we believe
We believe modern search and AI apps in the cloud should be both efficient and limitless.
We believe any engineer in the world should be able to quickly build powerful data apps. Building these apps should not be locked behind proprietary APIs and domain specific query languages that takes weeks to learn and years to master. Building these apps should be as simple as constructing a SQL query.
We believe modern data apps should operate on data in real-time. The best apps are the ones that serve as a better windshield for your business and your customers, and not be a glorious rear-view mirror.
We believe modern data apps should be efficient by default. Resources should auto-scale so that applications can take scaling out for granted and also scale-down automatically to save costs. The true benefits of the cloud are only realized when you pay for “energy spent” instead of “power provisioned”.
What we stand for
We obsess about performance, and when it comes to performance, we leave no stone unturned.
- We built RocksDB which is the most popular high-performance storage engine in the world
- We invented the converged index storage format for compute efficient data indexing and data retrieval
- We built a high-performance SQL engine from the ground up in C++ that returns results in low single digit milliseconds.
We live in real-time.
- We built a real-time indexing engine that is 4x more efficient than Elasticsearch. See benchmark.
- Our indexing engine is built on top of RocksDB which allows for efficient data mutability including upserts and deletes without the usual performance penalties.
We exist to empower builders.
- One database to index them all. Index your JSON data, vector embedding, geospatial data and time-series data in the same database in real-time. Query across your ANN indexes on vector embeddings, and your JSON and geospatial “metadata” fields efficiently.
- If you know SQL, you already know how to use Rockset.
We obsess about efficiency in the cloud.
- We built the world’s first and only database that provides compute-compute separation. Spin a Virtual Instance for streaming data ingestion. Spin another completely isolated Virtual Instance for your app. Scale them independently and completely eliminate resource contention. Never again worry about performance lags due to ingest spikes or query bursts.
- We built a high performance auto-scaling hot storage tier based on NVMe SSDs. Performance meets scalability and efficiency, providing high-speed I/O for your most demanding workloads.
- With auto-scaling compute and auto-scaling storage, pay just for what you use. No more over provisioned clusters burning a hole in your pocket.
AI-native search and analytics database
First-generation indexing systems like Elasticsearch were built for an on-prem era, in a world before AI applications that need real-time updates existed.
As AI models become more advanced, LLMs and generative AI apps are liberating information that is typically locked up in unstructured data. These advanced AI models transform text, images, audio and video into vector embeddings, and you’ll need powerful ways to store, index and query these vector embeddings to build a modern AI application.
When AI apps need similarity search and nearest neighbor search capabilities, exact kNN-based solutions are quite inefficient. Rockset uses FAISS underneath and supports advanced ANN indexes that can be updated in real-time and efficiently queried alongside other “metadata” fields, making it a very easy to build powerful search and AI apps.
In the words of one customer,
“The bigger pain point was the high operational overhead of Elasticsearch for our small team. This was draining productivity and severely limiting our ability to improve the intelligence of our recommendation engine to keep up with our growth. Say we wanted to add a new user signal to our analytics pipeline. Using our previous serving infrastructure, the data would have to be sent through Confluent-hosted instances of Apache Kafka and ksqlDB and then denormalized and/or rolled up. Then, a specific Elasticsearch index would have to be manually adjusted or built for that data. Only then could we query the data. The entire process took weeks.
Just maintaining our existing queries was also a huge effort. Our data changes frequently, so we were constantly upserting new data into existing tables. That required a time-consuming update to the relevant Elasticsearch index every time. And after every Elasticsearch index was created or updated, we had to manually test and update every other component in our data pipeline to make sure we had not created bottlenecks, introduced data errors, etc.”
This testimony fits with what other customers are saying about embracing ML and AI technologies - they want to focus on building AI-powered apps, and not optimizing the underlying infrastructure to manage cost at scale. Rockset is the AI-native search and analytics database built with these exact goals in mind.
We plan to invest the additional funding raised in expanding to more geographies, accelerating our go-to-market efforts and furthering our innovation in this space. Join us in our journey as we redefine the future of search and AI applications by starting a free trial and exploring Rockset for yourself. I look forward to seeing what you’ll build!