Why You Shouldn’t Build Real Time Data Apps on Data Lakes or Warehouses
More organizations are looking to build data apps on data warehouses, and there are several reasons for this.
- Increasing adoption of cloud data warehouses has led to a greater need to activate the data stored in them.
- Data apps operate on larger volumes of data, and data warehouses provide cost-effective storage for this purpose.
- Data warehouses are used for many analytical workloads, such as BI and data science, so data apps, which run many analytical queries, are presumed to also run well on data warehouses.
In this talk, we explore a number of ways in which data warehouses are not ideal for building data apps that require sub-second queries on real-time data. Drawing on his deep experience advising engineering leaders on real-time analytics, Venkat will share several considerations for why data apps should not be built on data warehouses:
- Query performance - Data warehouses are not designed for low query latency, relying on scans to locate data. Most databases optimized for query speed utilize indexing instead.
- Freshness of data - It takes time to transform data and batch data into a data warehouse before it can be queried. In many instances, this additional latency, which may be in the tens of minutes, will not meet the requirements of the data app.
- Compute cost - Data apps are considerably more compute intensive than traditional BI workloads. While data warehouses are optimized for lower storage costs, building data apps on them often results in significant expense on compute resources.
Watch this tech talk to make sure you are using the right tool for the job when building data apps.
Venkat Venkataramani is CEO and co-founder of Rockset. He was previously an Engineering Director in the Facebook infrastructure team responsible for all online data services that stored and served Facebook user data. Collectively, these systems worked across 5 geographies and served more than 5 billion queries a second. Prior to Facebook, Venkat worked on the Oracle Database.
See Rockset in action
Real-time analytics at lightning speed