Case Study: Is Your NoSQL Data Hindering Real-Time Analytics? Savvy Solved It with Rockset.
July 21, 2022
Rockset was incredibly easy to get started. We were literally up and running within a few hours. - Jeremy Evans, Co-founder and CTO, Savvy
At Savvy, we have a lot of responsibility when it comes to data.
Our customers are online consumer brands such as Brilliant.org, Flex and Simple Habit. They rely on our cloud-native service to easily build no-code interactive experiences such as video quizzes, calculators and listicles for their websites without the need for developers. Companies can then track the effectiveness of these education flows with their users through our analytics dashboard.
When you’re powering conversion flows that tens of thousands of visitors interact with every day, analytics are crucial. Our customers need to be able to analyze every step of the conversion funnel and their A/B tests to figure out where they can improve - and the whole point of using Savvy is so that companies don’t have to ask their own developers to build features like analytics because it comes included with our platform.
However, delivering rich and timely insights was a challenge for us from the start, as our original platform was great at ingesting data, but not so great at analyzing and reporting.
To keep growing, especially without service interruption, we needed a more powerful, plug-and-play solution.
Squaring the (No)SQL circle
We built Savvy using Google’s Firebase app development and hosting platform. Firebase’s highly-scalable, no-schema approach helped us move fast in development. Performance is also extremely fast - our embedded flows load in customers’ web sites in 300 milliseconds on average. They love that real-time performance.
We also had no problems monitoring and recording the activity of individual visitors to our customers’ websites. All interactions are streamed in the form of semi-structured events into Firebase’s NoSQL cloud database, where the data, which includes a large number of nested objects and arrays, is ingested. Showing our customers a list of recent visitors along with all of their interactions wasn’t just easy, it was also possible to do in realtime.
The issue came as soon as our customers wanted the ability to start filtering that list in some way, or viewing aggregate statistics such as number of visitors over time or a breakdown by referrer website.
Our original band-aid solution was just to apply the basic filters that Firebase supports, and perform any remaining filtering or grouping on the front end. Obviously, this soon started to come with performance issues: as we scaled up to tens of thousands of users, the growing possibility of query timeouts meant this strategy started to threaten our ability to display analytics at all.
In an attempt to make our queries fast again, our next plan was to do pre-computations on the ingested event streams and metrics, indexing them as they were being stored. However, we had to manually create an index for each new chart type that we added, and because the schemas for events kept changing, our pre-computations kept changing, too. This also meant that we were suddenly managing a whole load of data processing pipelines, which came with all the headaches you would expect – if a scheduled data processing was missed, for example, then the user would see out-of-date data or even a chart with a chunk of data missing in the middle.
Separating the Wheat from the Chaff
We looked closely at several alternatives, including:
- Postgres. While the venerable open-source database supports the complex SQL-based analytics we needed, we would have had to make significant rewrites, including flattening all of the JSON objects that we were throwing into Firebase. We had made substantial use of Firebase’s flexibility here, so losing that in a switch to Postgres would have been costly.
- QuestDB, another open-source SQL database oriented for time-series data. While the query examples that QuestDB showed us were both fast and highly-concurrent, and they had an impressive team building an impressive product, they were very early-stage at the time and the open-source nature of their solution would have meant more maintenance and oversight from us than we had the bandwidth for.
We ended up deploying a real-time analytics platform, Rockset, on top of MongoDB. We heard about Rockset through an internal forum post by a fellow Y Combinator startup, and realized that it was built to solve exactly the kind of problems we were having. In particular, we were attracted by these four aspects:
- The schemaless ingest of data combined with Rockset’s Converged Index that smoothly stores any kind of data and makes it ready instantly for any kind of query
- The ability to run any kind of complex SQL query and get real-time results
- The fully-managed service that saves us significant maintenance and engineering time and effort
- Rockset’s cloud developer portal that makes it easy to build and manage Query Lambdas and APIs
Rockset was incredibly easy to get started. We were literally up and running within a few hours. By contrast, it would have taken days or weeks for us to learn and deploy Postgres or QuestDB.
Since we no longer have to set up schemas in advance, we can ingest real-time event streams without interruption into Rockset. We also no longer need to spend a literal day rewriting one-time functions whenever schemas change, wreaking havoc on our queries and charts. Rockset automatically ingests and prepares the data for any kind of query we might have already running or may need to throw at it. It feels like magic!
Real-Time Analytics, Deployed Instantly
We use Rockset to search and analyze more than 30 million documents. This data is regularly synchronized with MongoDB and Firebase to provide live views in two key areas of our customer dashboard:
- The Live View. From here, our users can apply different filters to drill into any one of hundreds of thousands of customers and view their interactions on the site and where they are on the buyer’s journey.
- The Reporting View, which displays charts with aggregate data on visitors such as number of visitors per day, or visitors by source.
The real-time performance was a huge boon, of course. But also was the ease and speed with which we were able to drop in Rockset as a replacement, as well as the miniscule ongoing operational overhead. For our small team, all of the time we’re saving on manually building indexes, managing our data models, and rewriting slow and malfunctioning queries, is extremely valuable.
The result is that we've been able to move at speed while improving Savvy’s front end features, without compromising the quality of data and analytics for our customers.