Real-Time Analytics Explained
An explainer guide for application developers incorporating real-time analytics
Real-time analytics explained
The ever-increasing expectation of application users is to analyze data in real time. Users don’t want to leave their application to analyze data, wait seconds for analytics to load or make decisions on data that is minutes or hours old. The reality is, applications are constantly collecting real-time data from machines, user interactions, and operational infrastructure and users want access to it immediately.
Until recently, real-time analytics has been challenging to deliver at the speed and scale required of applications. Only a limited number of time critical use cases adopted real-time analytics as it was a time-consuming, expensive development endeavour. The growth in real-time data and the cloud has meant that real-time analytics is now within reach of even lean engineering teams.
Applications are rapidly expanding their real-time analytics capabilities. In the security software space, an increasing number of solutions use machine learning to detect vulnerabilities and interactive search and analytics to assess and mitigate risk. In the logistics space, real-time analytics is providing end-to-end visibility to suppliers and customers to enhance shipment tracking, route optimization and downstream initiatives. Across applications, real-time analytics is enhancing productivity, decision-making and the user experience.
In this guide to real-time analytics, we’ll explain what we mean by real-time analytics, how it differs from batch analytics, the characteristics of real-time analytics systems and more.
What is real-time analytics?
Real-time analytics is all about using data as soon as it is produced to answer questions, make predictions, understand relationships, and automate processes.
Gartner defines real-time analytics as:
The core requirements of real-time analytics is access to fresh data and fast queries. These are essentially two measures of latency, data latency and query latency.
Data latency is a measure of the time from when data is generated to when it is queryable. There is usually a time lag between when the data is produced and when it is available to query. Real-time systems are designed to minimize that time lag, allowing for changes in data to be reflected quickly.
Low data latency can be challenging to deliver as the system must be able to write incoming data while at the same time allowing the application to make queries on the most recent data. That means having a system that can handle high write rates and is optimized for real-time data processing not batch analytics jobs, which have traditionally been the data processing method for analytics.
Rockset designed a benchmark to measure the data latency of real-time databases, Rockbench. You can learn more about how to evaluate real-time systems using this benchmark.
Query latency is the time required to execute a query and return a result. Applications want to minimize query latency for snappy, responsive user experiences.
One company was able to increase application adoption 350% by speeding up their queries 2x. The standard set by the B2B customer moving forward:
Teams are increasingly setting sub-second query latency standards for their data applications. That said, massaging data and optimizing indexes to deliver consistently low query latency can be time-consuming, making it challenging for teams to iterate and expand on their analytical features.
Batch vs. real-time analytics
Batch analytics is high latency analytics where queries return results on data that is at least minutes old. In contrast, real-time analytics is optimized for low latency analytics, ensuring that data is available for querying in seconds.
One use case for batch analytics is business intelligence reporting. Business intelligence uses historical data to report on business trends and answer strategic questions. In these scenarios, the goal is to use data to craft strategy; not to take immediate action. Real-time data would not generally impact the result of the trend analysis, making this better suited for batch analytics.
Batch analytics use cases like business intelligence, reporting and data science have less stringent latency and therefore can tolerate ETL pipelines to homogenize and enrich data for analytics. In contrast, real-time use cases have low latency requirements and attempt to reduce or remove the need for ETL processes.
Many analytics systems like Hadoop and data warehouses were designed for batch analytics. Batch analytics systems process the data in batches, data is collected and loaded into the system over a period of time. Rather than having an “always on” system for data processing, they can restrict data processing to specific time intervals to reduce costs. Batching also helps with data compression, reducing the overall storage footprint and making it economical for periodic analytics on large-scale data.
In contrast, systems designed for real-time analytics have native support for semi-structured data and other modern data formats to avoid ETL processes and achieve low data latency. They are also optimized for compute efficiency to reduce the resources required to constantly process incoming data and execute high volume queries.
Companies like Facebook moved many features from batch to real-time analytics, including the display of content on the newsfeed. You can learn more about the journey in the tech talk: How We Scaled It: Facebook’s Online Data Infrastructure to 1B+ Users.
Benefits of real-time analytics
Real-time analytics is in increasing demand for the benefits it gives to application users.
Snappy, responsive applications
Snappy, responsive experiences increase user adoption. Embedded real-time analytics gives users a better experience, they don’t have to wait seconds to minutes for data or queries to load. They can interact quickly with the data, providing a seamless user experience.
Users can slice and dice data for quick decision-making. With sub-second query latencies, users can ask several questions of the data and reach a decision in a matter of minutes. This makes users more productive, increasing the number of decisions they can make in a day.
Applications can reduce the cognitive load of decision-making with automated or semi-automated intelligence. Teams can become more efficient, relying on applications for a subset of decision-making and focusing attention towards larger, strategic initiatives.
There are use cases that are inherently real-sensitive: catching security vulnerabilities, optimizing delivery routes or bidding on advertisements. If you waited minutes for the data to be processed and queryable, you would lose out on the window of time to make an impact. Real-time analytics ensures optimal decision-making for these use cases.
Use cases for real-time analytics
Real-time logistics management
A leading supplier of concrete in the United States built an insights suite to provide customers and suppliers end-to-end visibility into the supply chain. The insights suite enabled customers to effectively route shipments and ensure that the sites were prepared for the deliveries. Concrete has a short lifespan so sites needed to be ready to use the concrete immediately or risk jeopardizing the entire construction project. The analytics suite gave users the capability to analyze high level metrics as well as search for specific shipment information.
The team explored several options for user-facing analytics. They initially built the analytics suite on an OLTP database but realized that it could not deliver queries at the scale required. They explored offloading analytics to Rockset, a fully-managed, cloud-native solution.
Real-time security analytics
A SaaS security analytics company expanded its analytics capabilities to provide users with ad-hoc search, alerts and notifications and drag-and-drop dashboards. The mix of automated machine learning algorithms and ad-hoc query capabilities gave users robust analytics tools to identify and mitigate risk. Security teams also had more freedom to assess risk based on their knowledge of the organization, application usage and system users.
An e-commerce company expanding into new product lines recognized that a key to effective monetization was real-time personalization. Based on what users had in their cart, previous site interactions and customer data, the company devised a personalization algorithm. Real-time personalization increased the total order value for the customer and helped ensure the success of new product lines in the market.
Real-time personalization can be challenging to implement as many analytics systems were built for batch processing, resulting in stale personalized offers. You can learn more about how to build a real-time recommendation engine on the Rockset blog.
Real-time customer 360s
A CRM company provided its users with a unified, 360-degree view of interactions across its suite of marketing, support and sales products. Rather than go into each individual system to get a piece of the customer profile, the users could access a single view of all contact and company level data in one place. The goal was to better understand the customer journey and introduce new predictive capabilities to engage customers.
The CRM company had data siloed across products and was challenged to find an analytics system that could support sub-second latency joins at scale. After many attempts at performance engineering their analytics system, the company realized they needed a solution that was designed from the get-go to support full-SQL. Learn more about building a real-time customer 360 on the Rockset blog.
Real-time gaming analytics
eGoGames, a European esports platform for mobile devices, uses real-time analytics to improve the matchmaking process, detect fraud and build real-time reports on user metrics. The company initially constructed its analytics solution on a data warehouse but the data was delayed 15 minutes- too long to effectively optimize the game.
eGoGames realized they needed a system built for real-time analytics. You can learn more about the requirements for eGoGames in the blog- eGoGames Esports Platform Uses Rockset for Real-Time Analytics on Gaming Data.
Adding real-time data to your app with Rockset
You can use Rockset to bring real-time analytics to your application. Rockset is a real-time indexing database optimized for low data and query latency. There are 3 steps to go from real-time data to an API that developers can use for in-app analytics.
Step #1: Connect your data source
Connect OLTP databases, event streams and data lakes to Rockset with built-in data connectors. New data is ingested schemalessly and made queryable within 1 second of it being generated.
Rockset made a series of design decisions- mutability, massive write rates, built-in data connectors, native support for semi-structured data and more- to achieve low data latency.
Step #2: Run SQL Queries
Rockset gives you full SQL queries- search, aggregations and joins- on semi-structured data. Rockset builds a Converged Index - a search index, columnar index and row index on any data for fast search and analytics. The indexes enable millisecond-latency queries on a wide range of analytics out-of-the-box.
Step #3: Build Applications
Rockset’s query and data latency meets the snappy, responsive UI requirements of applications. We made it easy for developers to save their SQL queries and turn them into REST endpoints using Query Lambdas.
Real-time analytics is seeping into applications with users looking to immediately act on data. Real-time analytics requires a database designed to minimize query latency and data latency to deliver fast queries on fresh data.
If you’re working on a user-facing analytics project, you can give Rockset a try. We offer a 14-day trial with $300 in free credits.