Case Study: Rockset Enables Real-Time Operational Analytics in Hardware Manufacturing for PCH
March 25, 2022
- PCH International is a leading hardware manufacturer with global operations that requires ultra-fast analysis of huge volumes of streaming data.
- The existing data infrastructure built on MongoDB and DynamoDB couldn’t support real-time querying of data.
- PCH initially considered data warehouses such as Snowflake and Redshift, but found them too costly for real-time analytics.
- PCH chose Rockset because it could quickly ingest data from multiple sources including streaming sources with minimal setup and enabled fast query performance.
- Rockset enabled PCH to perform ad hoc complex queries within seconds, a huge improvement over the one-hour latency they were seeing before.
PCH International is a leading hardware manufacturer with a unique end-to-end model. It doesn’t just build Apple gadgets, Beats headphones and other products on behalf of brands, PCH also sources products it doesn’t make, and ships finished goods to retailers as well as straight to consumers.
Pioneering this Direct-to-Consumer (D2C) model has enabled PCH – with headquarters in Ireland, manufacturing in Shenzhen, China, and product design in San Francisco – to reap more than $1 billion in annual revenue.
Managing a global operation with tens of thousands of manufacturing partners, retailers, and brand customers requires ultra-fast analysis of huge volumes of streaming data.
However, PCH’s aging data analytics systems were increasingly unable to ingest data quickly enough nor provide the speedy, precise queries that its business operations teams needed.
PCH needed to upgrade its data technology for the age of real-time data.
Collecting End-to-End Data
From its founding in 1996, PCH had been ahead of the curve in its use of operational intelligence to power its business.
Founder and CEO Liam Casey has publicly enthused about its massive database of suppliers and products, which he called "Alibaba with brains," and another system that monitored and analyzed all its web orders.
PCH is "collecting data through all stages of product development, sourcing, manufacturing and distribution," according to a profile in Forbes in 2021. This helps PCH "identify and eliminate inefficiencies and bottlenecks, and to achieve coordinated improvements across all aspects of operations." It also helps PCH gain "visibility on the sustainability and environmental impact" of its operations.
Slow Ingestion and Queries
Collecting the data was one thing. Ingesting and querying it quickly was another.
All of PCH’s data, including real-time event streams, was being ingested into on-premises databases before uploaded into one of PCH’s two cloud databases: an Azure-hosted Cosmos DB service that is compatible with MongoDB, or secondarily, Amazon DynamoDB.
The data query layer was far too slow, according to PCH CTO Minh Chau.
PCH needed faster, more complex queries to make its supply chain fully visible to its supply chain analysts and customers. It took at least an hour for fresh data to be ingested and queried. PCH also sought more aggregation-type queries in order to better track shipments in real time and solve urgent supply chain problems.
Besides low data latency and speedy, precise queries on large datasets, PCH also required any new solution to be easy to deploy and manage for its small data engineering team.
PCH looked at its existing databases as potential solutions but found many challenges. DynamoDB does not natively support aggregations, so creating one requires extra engineering work with DynamoDB’s indexes, said Chau. With MongoDB, aggregations require a lot of processing power, which translates to higher cloud fees, he said. And to accomplish sub-second queries with MongoDB, all of the indexes would need to be pre-defined, he added.
PCH also looked at cloud data warehouses such as Snowflake and Amazon Redshift. Both are optimized for ingesting occasional batches of data rather than small-but-continuous real-time event streams like shipment data, Chau said, resulting in significant ingestion latency. These solutions were not only too slow, but also too costly for real-time analytics.
Fast Queries with Rockset
PCH then found Rockset’s real-time analytics database. Rockset’s ability to ingest data fast with minimal setup from many data sources, especially Amazon S3, impressed PCH. Rockset also provided a dashboard where PCH could monitor ingested data for data errors and incorrect fields.
Besides the ease of setup, Rockset also proved proficient at ingesting constant streams of updates from its web site or outside suppliers.
On the query side, Rockset was able to perform aggregation queries on large datasets within seconds and for a better price than its prior solution, Chau said. Rockset’s multiple indexes give PCH the flexibility to create many types of queries without having to do the work of predefining and building indexes on its own. Results for ad hoc complex queries also return to its analysts within seconds, a huge improvement over the one-hour latency they were seeing before.
Finally, Chau said that deploying and managing Rockset has been a smooth, low-ops experience. He’s glad to have chosen to build a solution that fits PCH’s specific needs rather than choosing a pre-packaged solution that would take even more customization work to make it fit for PCH.
“If you want to build something fast and fully-managed, and still have the flexibility to slice and dice the data in the way you want, Rockset is for you,” Chau said.
Embedded content: https://www.youtube.com/watch?v=MXiyXRpfXzA
Rockset is the real-time analytics database in the cloud for modern data teams. Get faster analytics on fresher data, at lower costs, by exploiting indexing over brute-force scanning.