How Klarna Scales Buy Now Pay Later with Real-Time Anomaly Detection
February 16, 2024
Klarna is a leading buy-now-pay-later company, giving shoppers more time to pay while paying merchants in full upfront. With a number of payment options, including direct payments, pay after delivery and installment plans, Klarna provides shoppers flexibility in how they pay with zero interest. The number of new payment options helps over 500k merchants using Klarna to attract, convert and retain global shoppers.
Klarna integrates seamlessly into the payment experience offering one-click purchases, regardless of the payment plan. The flexible options enable shoppers to make larger purchases responsibly, with merchants seeing a 41% increase in average order value and increase in conversions. Klarna supports the omnichannel consumer journey and shopping using the Klarna app, at a store or online.
The importance of monitoring integrations cannot be overstated for Klarna. As a payment system that operates by taking a percentage of the transaction fee from the merchant, the reliability of payment integration with the merchant and other partners' systems is of utmost importance. Any issues in these integrations can have significant consequences, resulting in lost revenue for both Klarna and its partners. Moreover, it directly impacts the end customers' experience, as integration issues can disrupt their ability to make seamless, reliable, safe, and consistent purchases. To swiftly identify and address these issues, Klarna utilizes statistical analysis, enabling the detection of anomalies across its partner base in under two seconds. This proactive approach ensures that Klarna can promptly resolve any integration issues, preserving revenue, building trust with partners, and providing end customers with a superior shopping experience.
In this blog, we’ll describe how Klarna implemented real-time anomaly detection at scale, halved the resolution time and saved millions of dollars using Rockset.
Billions of monitors at Klarna
As part of their commitment to exceptional service, Klarna has implemented specialized monitoring for their most transacting partners, encompassing integrations with merchants, distribution partners, and payment service providers. With billions of monitors tracking these partner facing integrations, Klarna can swiftly detect any issues or degradations on various dimensions such as partner, purchase country, payment method, browser, device, and acquisition channel, as well as operations including authorization, session, and order creation.
For example, Klarna compares counts and conversion rates in the current minute, previous minute and minute the same time the day before. The statistical methods Klarna employs generate alerts reliably, limiting the amount of noise and model engineering effort of the team.
Sub-second monitoring requirement
Before centralizing real-time monitoring of partner activity into a single platform, Klarna used a variety of traditional infrastructure monitoring tools and data warehouses.
In Klarna’s data warehouse solution, where most of this analysis occurred, it took six hours to get limited insights into partner integrations. Given the number of tools and the latency involved, Klarna decided to consolidate into a single solution and evaluated 10+ databases and monitoring tools using the following criteria:
- Real-time monitoring: Klarna required real-time monitoring to spot and resolve inconsistencies in partner integrations faster with the goal of identifying anomalies in under a minute
- Cost effectiveness at scale: With billions of monitors, Klarna realized early on that paying on a per metric or per event basis, a common method in traditional infrastructure monitoring tools, would be too expensive
- Flexibility: Klarna was adding new partners daily and wanted a quick, seamless onboarding experience. They also wanted the capability to add new metrics, data points and run ad-hoc analysis as they continued to build out real-time monitoring.
- Cloud offering: Klarna is built on AWS and made the decision early on to use cloud services and not get into the game of infrastructure management. They looked for easy-to-use solutions that would require very little infrastructure maintenance.
Evaluating 10+ solutions for anomaly detection
Klarna evaluated several solutions including infrastructure monitoring, real-time analytics databases and anomaly detection solutions including:
- Infrastructure Monitoring: Klarna evaluated a leading application performance management and observability solution. As Klarna already used the solution in-house for infrastructure monitoring, they knew it could meet the latency and support the number of metrics required. That said, many infrastructure monitoring tools are not built for business incident reporting, making its pricing model expensive for the billion-scale metrics that Klarna was tracking.
- Anomaly detection solution: Klarna also evaluated a leading anomaly detection solution that was built for business intelligence. Klarna liked the out-of-the-box anomaly detection as a service concept but realized that it would be challenging to tweak the anomaly detection algorithms for their specific use case. The team wanted the flexibility to iterate on anomaly detection over time.
- Rockset: Rockset is the search and analytics database built in the cloud. The team liked that Rockset could run fast needle-in-the-haystack queries to detect anomalies. Furthermore, Rockset’s ability to pre-aggregate data at ingestion time reduced the cost of storage and sped up queries, making the solution cost-effective at scale. With Rockset’s flexible data model, the team could easily define new metrics, add new data and onboard customers without significant engineering resources. Rockset met Klarna’s need for flexibility while providing a fully-managed, cloud solution that simplifies operations.
Rockset nails price-performance and ease of use
Klarna evaluated Rockset based on its query performance and ingest latency. Partnering closely with Rockset’s solution architecture team, Klarna defined windowed aggregations at ingestion time based on field combinations including by country, merchant, payment method and more. Using SQL group by functions, the team could analyze partner activity to find any partners with an anomaly or error.
Rockset’s document data model allows for flexibility and variation in the structure of each document. Rockset differs from typical document-oriented databases in that it indexes and stores the data in a way that supports relational queries using SQL. With Rockset’s data model, the team at Klarna could run a SQL query on a single collection, also known as a table in the relational world, to catch anomalies across billions of monitors. The team at Klarna was wowed by the speed and ease of use of Rockset, making it easy to initially prototype the real-time monitoring solution.
“The team quickly prototyped the monitoring application using SQL and was blown away by the speed and the ease of use, immediately realizing the capability of Rockset for real-time monitoring at Klarna,” says Christian Granados, Accountable Lead for Real-Time Acquiring Monitoring (RAM) at Klarna.
As a result of the prototyping and evaluation, Rockset was able to meet the one second ingestion latency and millisecond-latency query latency requirements. During the evaluation period, the Klarna team was not only able to assess the capabilities of Rockset but also build the end-to-end solution.
“We were looking for a partnership and close collaboration to find the best end-to-end solution for real-time monitoring, leveraging the unique capabilities of Rockset. During the evaluation phase, the level of support from the solution architecture team and executive alignment instilled trust” continues Granados.
While hitting the latency metrics was crucial to Rockset being considered for real-time monitoring, what convinced the team was understanding the underlying architecture. Under the hood, Rockset stores data in a Converged Index which includes elements of a search index, a vector search index, columnar store and row store. Depending on the query, Rockset’s cost-based optimizer finds the most efficient path to query execution leveraging multiple indexes in parallel. Rockset uses RocksDB, an open source key-value store built by the team behind Rockset at Meta, which is well known for its ability to handle high write rates and guarantee low latency ingestion.
According to Granados, “It all clicked for me when we did an architecture review and I better understood Converged Indexing and the cloud architecture- that’s when I realized how Rockset guarantees performance at scale.”
Rockset’s performance and architecture was the sweet spot between streaming data and low latency queries, making it well suited for real-time monitoring at Klarna. Based on Rockset’s performance, partnership and architecture, the team at Klarna felt confident moving forward with Rockset for real-time anomaly detection across its 500k+ merchants and partners.
Rockset and the end-to-end solution for real-time alerts
Klarna streams 96M events per day through an Apache Kafka topic and enriches the data using a Go application. The enriched data is streamed to Rockset where it is pre-aggregated and indexed for serving alerts and monitoring dashboards.
In Klarna, teams are structured as startups and some of them are responsible for owning and managing partner relationships. The teams responsible for owning the partner relations, have a mix of business leaders, technical engineers and analysts to ensure that each partner is onboarded and the product integration is working smoothly. The Real-Time Acquiring Monitoring (RAM) team centralizes real-time monitoring and alerting services across all partner teams. That said, it is the responsibility of each partner team to take immediate action on resolving integration issues.
Klarna heavily uses Slack to communicate and manage partner accounts. In the event that an anomaly is detected, an alert is triggered to the internal partner slack channel along with a time series graph so that action can be taken immediately. This enables Klarna to proactively support partners and helps to instill trust that the payment process is running smoothly.
“Klarna builds trust with partners by providing support throughout the partner lifecycle. If big merchants see a dip in shopping through Klarna, we make them aware of the issue, helping merchants investigate and remedy faster,” says Granados.
In addition to alerting, Klarna built a custom monitoring UI to make it easy for its partner account teams to drill down into activity data to quickly determine if an alert warrants taking further action.
Klarna saves millions with real-time anomaly detection
With real-time monitoring, Klarna can alert internal account teams to a problem before a partner realizes it and foster a trusted relationship. Being proactive has shown partners that Klarna is as invested as they are in the success of their business. Furthermore, moving the alerting from 6 hours to 2 seconds has cut the resolution time in half so partners realize more sales.
Rockset enables Klarna to provide partner account teams with detailed monitoring, with billions of monitors running 24x7, so that teams can identify the root cause of an issue faster. New partners get onboarded every day and engineers can quickly create new dimensions and data points for monitoring with Rockset’s flexible data model.
“Rockset is the simplest part of real-time monitoring at Klarna. I’d recommend Rockset to any company analyzing streaming data,” says Granados.
The speed, simplicity and efficiency of Rockset at scale has saved Klarna and its partners millions of dollars. Granados continues, “At Klarna, we recognized the importance of real-time monitoring of partner activity as a crucial factor in achieving our goals within this field. Rockset has been a game changer and makes fine-grained alerting at scale financially feasible.”