Real-Time Data Predictions for 2023

January 3, 2023

Register for
Index Conference

Hear talks on search and AI from engineers at Netflix, DoorDash, Uber and more.

This blog compiles real-time data predictions from industry leaders so you know what’s coming in 2023. Here’s what made it into the short list:

  • Streaming data will continue to see widespread adoption with cloud becoming the great enabler
  • Real-time streaming data stacks will start to replace batch-oriented stacks
  • Real-time streaming data stacks must impact the bottom line of the business
  • New applications for streaming real-time data emerge: data applications + real-time ML

Growth in the adoption of real-time streaming data

Streaming data went mainstream in 2022. Confluent’s State of Data in Motion Report found that 97% of companies around the world are using streaming data, making it central to the data landscape. The majority of adopters of streaming data have also witnessed an increase in annual revenue growth of 10%+, indicating that streaming data can impact the bottom line of businesses.

Lenley Hansarling, the Chief Product Officer at Aerospike, predicts that real-time streaming data will continue to pick up in 2023 and be used for high-value initiatives. “Despite an uncertain global economy, real-time data will continue to grow at 30%+ in 2023 as the need for an accurate, holistic, real-time view of a business increases. Enterprises will examine how to leverage real-time data to mitigate risk and find more value in margins and operational costs.”

To expand the reach of streaming data in organizations requires an investment in education and training. Working with streaming data has, until this point, been a job relegated to “big data engineers” with years of experience managing complex, distributed data systems. We predict that streaming data will become more accessible and usable with education and training programs, along with cloud-native systems, that break down barriers to access.

Danica Fine, a Senior Developer Advocate at Confluent, echoes this sentiment: “This year, the concept of data as a product will become more mainstream. Across many industries, data streaming is becoming more central to how businesses operate and disseminate information within their companies. However, there is still a need for broader education about key data principles and best practices, like those outlined through data mesh, for people to understand these complex topics. For people creating this data, understanding these new concepts and principles requires data to be treated like a product so that other people can consume it easily with fewer barriers of access. In the future, we expect to see a shift from companies using data pipelines to manage their data streaming needs to allowing this data to serve as a central nervous system so more people can derive smarter insights from it.”

Move from batch-based stacks to real-time streaming data stacks

Pairing an event streaming platform like Confluent Kafka or Kinesis with a batch-based data warehouse limits the value of the data to the organization. Moving to real-time streaming data stacks open up new possibilities for using low latency data across the organization for anomaly detection, personalization, logistics tracking and more.

Eric Sammer, the CEO at Decodable, outlines the value of real-time streaming data and how batch-based systems dilute the customer experience in the 2023 prediction: “As technology companies, our customers' expectations have been set by their experiences with those apps. Legacy databases aren't equipped to handle the technical realities of this world, and as much as IT operations teams want to emulate the data analytics stacks of sophisticated companies delivering lightning-fast, up-to-the-second data experiences, cobbling together the pieces that result in real-time data delivery isn't realistic from a time, talent, or cost perspective. Companies using batch ETL concepts for their data architecture are at risk of losing customers to competitors who are offering a better user experience through a modern data stack that delivers streaming, real-time data.

With that backdrop, we look ahead into 2023 and see a year in which companies will transition away from legacy, batch-based data stacks of the past and will pivot to specialized, real-time analytical data stacks that can manipulate data records in motion through simple stream processing. They'll see the benefit of easy implementation of things like change data capture, multi-way joins, and change stream processing while still having their batch and real-time needs met.”

The data warehouse is the epicenter of the batch-based stack but for companies embracing streaming, they’ll move more workloads to real-time systems that are built to handle constantly streaming data in modern data formats.

Here’s what Jay Upchurch, EVP and CIO at SAS Software, says about organizations moving from data warehouses to real-time databases: “In 2023, we will continue to see movement away from traditional data warehousing to storage options that support analyzing and reacting to data in real time. Organizations will lean into processing data as it becomes available and storing it in a user-friendly format for reporting purposes (whether that’s as a denormalized file in a data lake or in a key-value NoSQL database like DynamoDB). Whether a manufacturer monitoring streaming IoT data from machinery, or a retailer monitoring ecommerce traffic, being able to identify trends in real time will help avoid costly mistakes and capitalize on opportunities when they present themselves.”

Real-time streaming data stacks must impact the bottom line of the business

Many organizations have invested heavily in data infrastructure without being able to reap the rewards in revenue or operational efficiency. With the changing economic climate, every database and data system will be under heavy scrutiny to deliver actionable insights that move the bottom line.

As Alexander Lovell, Head of Product at Fivetran, put it, “2023 will be put up or shut up for data teams.” Alexander further goes on to say, “Companies have maintained investment in IT despite wide variance in the quality of returns. With widespread confusion in the economy, it is time for data teams to shine by providing actionable insight because executive intuition is less reliable when markets are in flux. The best data teams will grow and become more central in importance. Data teams that do not generate actionable insight will see increased budget pressure.”

Data and analytics will be a powerful tool enabling digital transformation. Organizations that have laid the groundwork for real-time streaming data will be in a better position to act confidently, swiftly and intelligently as the economic landscape evolves. But, it’s not enough to just be data-driven, organizations must also have a flexible infrastructure that enables iteration. Developer velocity is top of mind for every engineering team.

We’ve seen up until the point many multi-year modernization initiatives that, while having a long-term impact on an organization, fail to bear fruit in the short term. 2023 will be a year where every project must align to either cost savings or revenue and so many of these longer term initiatives will get chunked into projects that have an actionable impact.

The year of the data app

The highest value that you can derive from your data is to feed it back into your application to offer compelling user experiences, fight spam or make operational decisions. In the past ten years we’ve seen the rise of the web app and the phone app, but 2023 is the year of the data app.

Dhruba Borthakur, co-Founder and CTO at Rockset, says, “Reliable, high performing data applications will prove to be a critical tool for success as businesses seek new solutions to improve customer facing applications and internal business operations. With on-demand data apps like Uber, Lyft and Doordash available at our fingertips, there’s nothing worse for a customer than to be stuck with the spinning wheel of doom and a request not going through. Powered by a foundation of real-time analytics, we will see increased pressure on data applications to not only be real-time, but to be fail safe.”

The backbone of every data app will be a streaming architecture for seamless, instant experiences. While data apps were once relegated only to big internet companies, in 2023 they will become central to B2C and B2B organizations of all sizes.

The cloud is the great efficiency enabler of real-time streaming data stacks

With streaming data, the data never stops coming. With data applications, the application is always on.

Real-time streaming data architectures have not been within reach of many organizations due to the cost of resources and the inefficiencies of batch-based stacks when retrofitted for streaming data. Furthermore, real-time databases are complex distributed data systems requiring teams of big data engineers to ensure consistent performance at scale.

That’s all changing with the modern real-time data stack. At the core of the stack are cloud-native systems that are designed to separate storage and compute resources for efficient scaling. These systems were built for the demanding requirements of streaming data so they know how to use resources efficiently.

Ravi Mayuram, CTO at Couchbase, sees cloud databases being a great enabler: “Cloud databases will reach new levels of sophistication to support modern applications in an era where fast, personalized and immersive experiences are the goal: From a digital transformation perspective, it’s about modernizing the tech stack to ensure that apps are running without delay – which in turn gives users a premium experience when interacting with an app or platform. Deploying a powerful cloud database is one way to do this. There’s been a massive trend in going serverless and using cloud databases will become the de facto way to manage the data layer.”

Furthermore, databases will be judged increasingly on their efficiency and performance. We’ll see more cloud efficiency benchmark wars emerge, according to Dhruba Borthakur: “With the current bearish market economy, every business is feeling the need to reassess the cost of these real-time data analytics systems to better understand price-performance. We are seeing more benchmarks competition from data vendors like Snowflake and Databricks to prove its value to customers, and the data systems that can do more with less are the clear winners. In 2023, we will see benchmark wars between cloud data vendors showing one system being more efficient compared to the other.”

ML and real-time streaming data put a ring on it

Many of the real-time analytics initiatives with the greatest impact on revenue generation and operational efficiency have intelligence at their core: anomaly detection, personalization, ETA predictions, smart inventory management, and more.

Varun Ganapathi, co-Founder and CTO at AKASA, sees AI as a deflationary force similar to the likes of software: “Microsoft CEO Satya Nadella recently said, “software is ultimately the biggest deflationary force.” And I would add that out of all software, AI is the most deflationary force. Deflation basically means getting the same amount of output with less money — and the way to accomplish that is to a large degree through automation and AI. AI allows you to take something that costs a lot of human time and resources and turn it into computer time, which is dramatically cheaper — directly impacting productivity. While many companies are facing budget crunches amid a tough market, it will be important to continue at least some AI and automation efforts in order to get back on track and realize cost savings and productivity enhancements in the future.”

While rule-based systems have “ruled” until now, we’re going to see many more organizations use ML to make better predictions and adapt to changing conditions faster. Anjan Kundavaram, Chief Product Officer at Precisely, says: “We can expect successful data-driven enterprises to focus on several key AI and data science initiatives in 2023, in order to realize the full value of their data and unlock ROI. These include: (i) Productizing data for actionable insights, (ii) Embedding automation in core business processes to reduce costs, and (iii) Enhancing customer experiences through engagement platforms.”

Underpinning ML systems is real-time streaming data. Dhruba Borthakur predicts the rise of real-time machine learning: “With all the real-time data being collected, stored, and constantly changing, the demand for real-time ML will be on the rise in 2023. The shortcomings of batch predictions are apparent in the user experience and engagement metrics for recommendation engines, but become more pronounced in the case of online systems that do fraud detection, since catching fraud 3 hours later introduces very high risk for the business. In addition real-time ML is proving to be more efficient both in terms of cost and complexity of ML operations. While some companies are still debating whether there’s value in online inference, those who have already embraced it are seeing the return on their investment and surging ahead of their competitors.”

The predictions keep coming

That’s all we got for real-time data predictions for 2023. Here are more data and analytics predictions compiled by some of our favorite sites and leaders in the data space (+ used to source predictions for this blog):