What Is a Data Application?
An Explainer Guide for Developers Building Data-Driven Applications
What Is a Data Application?
A data application analyzes large-scale data to quickly surface rich insight or take autonomous action.
Data applications go by several names. Some call them “analytical applications.” Others use the term “data-intensive applications” or “data-driven applications.” We prefer data applications -- short and to the point.
Data applications come in many forms. Some appear as customer recommendation engines on websites. Others are rich data visualizations embedded into Salesforce and other business applications. Still others operate in the background, making smart autonomic decisions and surfacing timely insights.
What modern data applications share in common are their disruptive potential, enabling companies to create new business models, imbue their operations with unprecedented levels of automation, and become a force multiplier for productivity. Online and social commerce, security, fleet management, shipping and logistics, and many other industries are being transformed today by data applications.
The Difficulty of Building Data Applications
Every application uses data, but not every application is a data application. To meet the requirements of today’s digital enterprises, data applications need to be fast and deal with data at scale. There are several challenges to building data applications, including:
Deliver low-latency, complex analytics: Applications must respond instantly. Traditional application queries were simple lookups that could be handled quickly by a key-value database. Achieving low latency for complex analytics is just hard, especially when it involves large aggregations and joins. Data applications have the challenging requirement of delivering sub-second complex analytics.
Handle high-velocity streaming data: More data is available than ever before and it’s streaming at ever higher speeds. Data applications need to process this streaming data coming from event streaming platforms like Apache Kafka or Amazon Kinesis as well as database change data capture (CDC) streams and make decisions quickly. This is especially true for data applications that need to take immediate action on fresh data. You’ll generally see these types of applications in industries like logistics, e-commerce, airlines and more.
Support analytics on semi-structured data: Device and application data is semi-structured in nature and increasingly stored in modern data formats like JSON and Avro. The challenge for data applications is to be able to query large volumes of semi-structured data. While many NoSQL systems have been designed to ingest semi-structured data, most cannot efficiently query that data.
Handle high concurrency, reliably: Data applications are becoming core to many platforms. For example, you wouldn’t use Facebook if the newsfeed no longer loaded your friends’ status updates quickly. Data applications have the same performance requirements as traditional applications. They must be able to handle high concurrency, availability and scalability.
Seamless user experiences: For users, data applications must be immersive and not foreign from the rest of the application. They should be responsive, providing timely recommendations and the ability to interact with and search data and get instant results. Building such an experience requires tight collaboration between engineers and data teams, rather than their traditional siloing.
These are difficult requirements to achieve. What makes them especially challenging is that traditional data systems, particularly OLTP databases and data warehouses, were not designed for this set of requirements.
How a Data Application Differs from...
A Transactional Application
The classic database application is transactional. Rather than analyzing data, it retrieves and stores data, usually to a relational database such as Oracle, MySQL or SQL Server. Airline reservation systems, health records, and accounting systems have long relied on classic database applications to store and provide simple data, not insight -- lookups, not recommendations.
By contrast, modern data applications are analytical, focused on gleaning insights from the data without writing or updating it. They enable users to browse data at a high level, combine it with other datasets for exploration, and make deep dive searches. Data applications can also automate operations by monitoring incoming data and sending alerts and triggering actions when conditions have been met.
Like data applications, BI is analytical, drawing upon diverse datasets to produce useful insights. The similarities end there. The output of BI -- executive reports and analyst dashboards -- tends to be static or sluggish, as data warehouses were not built with speed or concurrency in mind. Furthermore, in most BI use cases, you’re looking at long-running historical aggregations of the data. In contrast, data applications are analyzing real-time data to affect immediate decisions and actions.
Many data applications serve targeted data that is relevant to individual users, devices or machinery. For instance, customer personalization involves joining historical insights from a specific user with their latest website interactions. Route optimization involves designing a route based on that particular vehicle and the optimal way to reach its destination. That requires being proficient in both search and analytical queries to deliver fast insights in a compute-efficient way.
What Is a Data Application Good For?
Data applications provide the rich, speedy insights, automation, and integration that data-driven businesses are using to leapfrog competitors. Primary use cases include:
- Real-time personalization and recommendation: Data applications combine historic and newly-ingested data to deliver instant insights and recommendations proactively or during interactions with customers. For instance, the vitamin company Ritual analyzes customer purchases and product views to create “affinity profiles” of its customers. This allows Ritual to send targeted banner ads as customers browse its site, or instant coupons and bundled offers during checkout, both driving increased sales for Ritual.
- Predictive maintenance: Bosch Power Tools streams data from IoT sensors in its factories to a real-time database that monitors and automatically responds to manufacturing delays and missing parts. It can also monitor and respond to alerts from malfunctioning products out in customers’ hands. Using a data application allows Bosch to respond faster and more reliably than with human workers, prevents employee burnout, and frees them up for higher-value tasks.
- Fraud and anomaly detection: Data applications can alert users to examine data in reaction to a specific event or anomaly. For example, a security application can monitor for potential vulnerabilities and breaches. If one is spotted, security teams are alerted to start drilling into the data to determine whether a breach has occurred or whether it was a false alarm. A similar use case for data applications is searching credit card transactions for potential fraud.
- Real-time logistics and fleet management: Carsharing, crowdsourced food delivery, shipping companies, and many others benefit from the autonomic capabilities that data applications can provide. Command Alkon’s cloud-based software tracks more than 80 percent of concrete deliveries in North America. Concrete shipments are time sensitive, as batches can harden and be ruined if there are delays. Construction companies use Command Alkon to track incoming concrete deliveries, get alerts of potential delays, and drill into the shipment data to determine the root causes of the delays.
- Financial investment applications: Data applications can be powerful tools for data exploration, allowing businesses to combine historic and real-time data sources to generate new strategies and de-risk bold decisions. For example, the venture capital firm Sequoia Capital (full disclosure: an investor in Rockset) has built a suite of internal data applications used by their investment teams to explore data that guides their decisions on which startups to invest. Also, sales and marketing teams may analyze CRM data to improve sales conversions.
Technologies for Building Data Applications
While data applications have a challenging set of requirements, the good news is that an ecosystem is emerging to help developers build these types of applications. You're no longer wedded to just an OLTP database for your application. The even better news is that many of these components are also "modern" meaning that they're cloud-native and easy to use, putting data apps within reach of all engineering teams. Here are a few of the components of the data system that developers are leaning on for this new type of application:
Real-time Data Streams: Real-time, streaming data is more affordable and accessible than ever before given the rise of cloud event platforms including Confluent Kafka and Amazon Kinesis. Change data capture has become increasingly mainstream with tools like Debezium and Amazon Database Migration Service that utilize database logs to provide updates to downstream systems and applications.
Real-time Analytics Databases: There is a new category of databases that have emerged, including Rockset, that have made technical tradeoffs to better support real-time data and low-latency analytics. Many of these systems originated at web-scale companies where there was a need for large-scale distributed analytics systems to support a growing number of data applications. Rockset, for instance, was founded by the team behind Facebook’s online infrastructure that supported data applications like the Facebook Newsfeed and spam fighting algorithms. Real-time analytics databases typically support high-velocity streaming data with real-time ingest and cost-effective data rollups. They deliver low-latency analytics by bringing together best practices from search engine and data warehouse technologies.
Data APIs: APIs make it easy to handoff real-time data to engineering teams implementing data applications. We’re seeing more database technologies create data APIs, SQL APIs or GraphQL APIs to make that handoff smooth. These APIs help to ensure there are no injection errors, allow for versioning control and introduce tagging capabilities.
Caching Tools: We advise using a caching layer in conjunction with a real-time analytics database when you have an internet-scale product that requires tens of thousands of concurrent queries. For the majority of use cases, we recommend trying to just use a real-time analytics database as it will be cheaper and less complex than an in-memory caching system.
Visualization Tools: Many data applications involve custom UIs that are built for a seamless user experience. Other data applications use tools like Grafana or Apache Superset have become increasingly mainstream for data observability or operational analytics. Additionally, there are visualization libraries like Vega that make it easy to build visualizations on complex data.
Look around you and you’ll start to see data applications everywhere, powering your favorite shopping site, accelerating your food deliveries, and predicting inventory shortfalls before they occur. Data applications like Facebook real-time status updates, Uber driver tracking and Amazon’s recommendation engine have been core to digital transformation.
These applications have traditionally been challenging to build because they have hard requirements. Transactional systems have been able to deliver fast simple queries but not serve complex analytics. Data warehouses can hold petabytes of data but you’re stuck waiting seconds to minutes for analytical queries to load.
Luckily, there’s a new ecosystem of tools emerging to support these types of data applications. If you are looking to build data applications and needing a simpler way, talk to us or start a free trial.