Rockset Podcast Episode 5: Why Wait? The Rise of Real-Time Analytics
Kshitij Kumar is the first Chief Data Officer driving a truly data driven culture at Farfetch. His background includes a variety of data leadership & advisory roles at companies like Ericsson and Nortel. Kshitij talks about data culture, the shift from cloud to serverless, and the different types of real-time analytics.
About This Podcast
Gio Tropeano: We're back with another episode of "Why Wait? The Rise of Real-Time Analytics" podcast, powered by Rockset. During this podcast, we invite business leaders, app development thought leaders, and analytics specialists to share their stories with the world, providing insights into what your peers are doing to improve data and application analytics.
Gio Tropeano: I'm Gio Tropeano. I'm here with my co-host Dhruba Borthakur today, the esteemed co-founder and CTO of Rockset, the mastermind behind Hadoop, with a history that spans the early days at Yahoo, and growth phase at Facebook engineering. Dhruba, I know it's really early for you. I hope you've had your caffeine this morning. Welcome.
Dhruba Borthakur: Thanks.
Gio Tropeano: Before I kick it off. If you're listening to this podcast and you have a question or would like to comment, please do so on our community Slack channel at "Rockset-community.Slack.com," or tweet at us at "@RocksetCloud."
Gio Tropeano: Today, we have a truly global perspective for you. As I mentioned, Dhruba is in the Silicon Valley. I'm in Boston, Massachusetts, and it's been a busy morning so far, and our guest is actually in London, and it's past lunchtime, and he can almost taste the weekend.
Gio Tropeano: Have you heard of Farfetch? Farfetch began as an e-commerce marketplace for luxury boutiques around the world. Today, the Farfetch marketplace connects customers all over the globe. Their mission is to be the global platform for luxury fashion, connecting creators, curators, and consumers. A little bit about the scale here, because I think this is super interesting. Farfetch, gross merchandise value or GMV exceeded $3 billion, up 49% year over year. They're publicly traded at FTCH. They have 4,500+ employees, 3 million active customers, 1,300+ supply partners, and they deliver to 119 countries. Think about the data that they must manage, that they have; the size, complexity, and security concerns there.
Gio Tropeano: I wanted to preface the introduction there because I'm really excited to introduce Kshitij Kumar. He is the first Chief Data Officer of Farfetch. He drives a truly data-driven culture there. His background includes a variety of data leadership and advisory roles at companies like Ericsson, Nortel, and others. Kshitij, a huge welcome to "Why Wait?" podcast, and we're excited to have you here with us today.
Kshitij Kumar: It's good to be here, Gio. Thank you.
Gio Tropeano: The pleasure is all ours. Dhruba, now you and Kshitij know each other as well, right? You guys go way back?
Dhruba Borthakur: Yeah. Thanks, Kshitij, for being here. Me and Kshitij go back a long way back in time. The first job after I graduated from undergrad school, we started to work together, and that was in a completely different area. We were building systems for building large-scale digital switching systems. That was a very different kind of technology compared to the data systems that both of us work in.
Dhruba Borthakur: So I'm really excited to ask Kshitij, the first thing that actually comes to my mind, tell me a little bit about how you started to work on digital switching systems and then you moved over to working more on the data system. And you have become a top leader in the data ecosystem for e-commerce and many other companies in general. Tell me what excites you about this journey, and what are you finding exciting about the data journey for yourself?
Kshitij Kumar: Absolutely Dhruba, those were the days, huh? When we were working right after college. So as Dhruba said, we started in digital switching, and I think one interesting fact, Dhruba, wasn't it that we were part of that group that actually got a telephone into every village in India, right? This is long before you had cellular phones and stuff. And being able to actually just pick up a phone and talk from every village in India. I think that's some of the things that we can be proud of being part of. What happened after that, Dhruba, was I went to Canada. I'm a Canadian as well as an American. I moved to the U.S., to Silicon Valley. And just like yourself, I was part of an entrepreneurial mindset, the culture in the Valley. Built a few companies, raised over $90 million over a period of time.
Kshitij Kumar: And then, during 2009/2010, one of my companies got acquired. It was called Teletopia. And I was working for this company called Concurrent when I started getting exposed to data, and seeing the scale of the data that was coming down with the path. So I had been a telecom and a video person, and then I started going into data. So in the early 2010s, I left Concurrent and started learning on the ground, starting up Hadoop systems. I actually used to attend those meetups. I've seen you at some of those meetups in the Valley, when you would speak about these things, and started learning on the ground: what does, how does, data actually work? And over a period of time, I ended up building enough knowledge. I took on a role at Ericsson as a principal engineer, principal consultant in this space. And then built telecom systems in the data space for several years there.
Kshitij Kumar: Well, a couple of years ago, I decided I'd had enough of telecom and data in that space, and wanted to learn something new. So I went to Berlin and joined a company called Zalando, which is a leader in fashion in Europe. And I was leading the data efforts at Zalando. After a couple of years with them, this opportunity came up with Farfetch. So I came across from Berlin to London, and here I am. What really excites me, this whole journey, as you can see, today, you can have multiple innings, so to speak, if I use the cricket analogy. You've got two innings, or baseball I guess. I had the first innings as a software person, and the second inning as a data person. And being able to actually start on the ground and hands-on got me a totally different perspective into: How do you use this data for solving business problems?
Kshitij Kumar: In the fashion space, for instance, a very different space; you can make so much of a difference in how people use the data. I'll give you one example. When we talk about sustainability, for instance, we talk about the environment getting worse every day. Do you know that it takes 10,000 liters of water to actually make one pair of denims? Now, imagine when somebody goes to a website and looks at these denims and purchases one, and they don't like it, and they return it. What did that do? Or they threw it away. What did that do? That loses 10,000 liters of water that was used to make that pair of denims.
Kshitij Kumar: It's really important for us to make sure, from a data perspective, that we're able to give that personal experience to that person. Let them understand: Do they really like what they're getting? Will they enjoy it? How will they feel with it? Is it going to fit them? Is the size going to be right? And then ship it to them from the place that's closest to them, rather than from a faraway place. Or if they have to return it even, then don't have them return it to someplace far away so that every truck roll that you have can be reduced. What's exciting to me really is that you can use data not just to drive business, but also to help the environment at the same time. So many different uses of data.
Dhruba Borthakur: You also talked about how you can use data to kind of optimize some of the business processes that you are mentioning, right? Delivering from the closest location, or delivering the most efficient algorithm. Tell me a little bit about how real-time analytical systems might be used to make these kind of efficiency gains for your business.
Kshitij Kumar: I think from a retail perspective, if you think about it, when somebody orders something, if you can get that item to that customer fairly quickly, then the whole experience that person has is much, much better rather than having to wait for a while for that thing to get to them. Or if they do a return, for instance, being able to get it back quickly and then issue them that refund, very important. So these kinds of business processes behind the scenes are really part of that entire customer experience.
Kshitij Kumar: And so you have to actually think this through beforehand, you have to actually make sure that your data is prepared in that way. And when that person, for instance, is purchasing this finally, and let's take a really, really simple example that happens everyday. When they're purchasing that thing, and they now need to be told, what time are they actually going to see this delivery, right? The estimated delivery date and time. That has to be done in real time, because now you have to understand: Where is this person? Where is the product? Are they going to cross boundaries? Is there going to be customs required? How can we actually get it to them, and what's that window that we can get it to them? Because if you can do that right, then the person has a much better experience.
Kshitij Kumar: So these are the kinds of things that you'd have to do from a real-time perspective, to be able to respond. Sometimes, actually, "real-time" is a subjective word, or phrase, I would say. What is real-time for fraud analysis, for instance, is not real-time for even that estimated delivery example that I was giving. When fraud is happening, you really want to see it right then. And you want to be able to say, "I'm going to stop this right now. What's the probability?" You need to be able to do the machine learning and the data science, and be able to predict that this is most likely a fraud example. Because if you don't, then you're going to lose real money at that time for the company.
Kshitij Kumar: So even in real-time, there are different types of real-time, the actual time that you need to respond in. And then there are things that you don't actually need to do in real time. And a lot of the work that we do in analytics is things that we work on over a multi-hour window. And just being able to give that response in a few hours is sufficiently timely. So it may not be real-time, but it's timely enough.
Kshitij Kumar: I'll give you a different example of marketing. Let's say if somebody is on the internet, and is searching for a certain brand, and a certain designer bag. When they do that search at that point, you've got multiple companies working on this demand generation, and it's important for every one of these companies to look at the value of that transaction that's going to come up, and say, "Do we want this customer?" And take that decision, on whether we want that customer to come to our site and participate in our experience, and you have to decide now: Are they going to be happy with our experience? Therefore this is a real-time decision you have to take. But a lot of the work that has gone into it is actually not real time. You've actually spent time understanding what kind of customer is most likely to have a good experience with us. And all of that work happens beforehand. The decision itself needs to be taken in real time.
Kshitij Kumar: I'll pause there and let you ask a question if you want.
Gio Tropeano: I love that you're speaking about the mindfulness of data and corporate culture. It's interesting that you mentioned real-time versus timely because it's all dependent on what the customer actually needs. Especially if you're comparing retail versus cybersecurity, especially in fraud detection, completely different what real-time might might mean.
Gio Tropeano: This podcast is about real-time analytics, but it's meant to educate just about data and analytics in general as well. Depending on the thresholds needed, how would you advise a data leader, or an analytics leader, to look at real-time analytics? Through what lens? What steps would you suggest that they take in order to build analytics into their platform, or into their mindset?
Kshitij Kumar: I think it's a multi-step effort, so you have to be really methodical when you think about your data journey. And what that means is you need to really understand: What are the data sources? Where is the data coming from? What kind of work do you need to do with that data before it becomes in the form that is going to be usable?
Kshitij Kumar: Oftentimes, companies have data warehouses, and data lakes, and operational data. There are different places where that data is actually going to be acted upon, and stored, and parked for some time. And real-time analytics is one very critical part of it. So if, for instance, you're going to be looking at this data in a few hours, or even a few minutes after it's actually happened, there are different decisions you have to take on the kind of technology that you have to use to get it to that point.
Kshitij Kumar: But the most important part is, you have to be methodical about it. You need to think about where's the data going to come from? What's the quality of this data? How do I need to clean it up, make it usable? And then, when does my customer, when does my stakeholder require a response? If the business requires a response urgently, quickly, in order for that customer to have that experience if you're in retail, or if you're in cybersecurity or fraud, even fraud's a retail case, you want to respond right away. Then you've got a totally different set of things that you can work with. You don't have the luxury of working on that data too much. You have to work with whatever data you're getting right then, in real-time. And therefore, the technologies or the tools that you use, what you do is you go to Dhruba, and you ask him, "Dhruba, how do you implement a real-time use case in this scenario?" Right?
Kshitij Kumar: But that's one part of it. I think that's not the entire story. What you have to do is, you have to think about the entire story and say, even after these decisions have been taken: What can you learn from this, right? That doesn't need to actually be a real-time use case. Once you've got all of this data, where are you going to park it? How are you going to make a data quality effort? People talk a lot about data governance, and that's a really important part as well. How are you going to store it somewhere, so you have the lineage? So when something goes wrong down the road, and in real-time, when you're able to actually tell that something is going wrong, how can you go back to where the data is parked, and be able to look at it and say: "What does my machine learning tell me? Is there something that happened? Was an anomaly detected in this data somewhere? Did we change something behind the scenes that's causing something different to happen?"
Kshitij Kumar: So I would say the most important point really is: Think through the whole process. Use the right tools, and the right processes, and the right governance mechanisms for each part of it. For the real-time analytics ones there are different technologies than if you're doing analytics the next day, when you've had a lot of time to actually process the data and park it, and be able to look at it, to come up with the results the business needs.
Dhruba Borthakur: You clearly explained kind of the difference between some of the technologies that we used to use earlier. You mentioned Hadoop in your discussion, and that was very much a system where was very less of real-time analytics. But based on how the industry has progressed these days, what kind of software stack, or what kind of tools would you advise people to say that, "Okay, this might be preferable for you to use if you are focused very much on the real-time analytics side of things."
Kshitij Kumar: So what we have seen out there, and this is not necessarily what any of my employers are using right now, but what we've seen out there are everybody is using things of... you have to have an event bus, for instance. You have to have something that's going to give you, whether it's a Kafka or something else. You've got to have a place where events are coming and can be distributed from. Once they are distributed, of course, there are in-operational data. You may not even actually have the time to send that event out and wait for it to be distributed. You may have to have an immediate response. You might use something like a Rockset, for instance, for that operational story in order to be able to get to it very quickly.
Kshitij Kumar: But if you think about it, you absolutely have to be in the cloud, number one I think, in today's environment. When you're in the cloud, you have to have ways of understanding: How quickly do I need to respond to this event, and what do I need to do in order to do that response? And then you find the technologies that are most suited to it. In our case, in my past lives, we've had a combination of yes, Hadoop, even today, combination of SQL-based data warehouses. We've used everything from BigQuery on GCP to Redshift on AWS, et cetera, for things where you didn't need to respond very, very quickly. And then for self-service locations, putting things into a data lake, for instance, an Azure Data Lake service, or on AWS with S3 and other things.
Kshitij Kumar: But despite all of these things, that doesn't solve the real-time part of this. So you do need to have something that's really specific and focused for that part of it. I'm not going to make a product recommendation because I'm not an expert on it, but I believe you do need that. And then separately you need an analytics system. We worked with things like Looker at my current employer; we have Looker on Google that we use. We also use multiple other analytics tools. We are also an Azure customer right now, so we have a big data lake on Azure ADLS, and you can connect to that with many of these tools as well.
Kshitij Kumar: You have to step back and say, "Which parts of this need to be implemented very quickly with a very short latency?" And then, "Which technology is useful for them?" I don't think there is any one single technology that I would say covers all of these use cases. And there are pros and cons on different parts of it. What I would say is these are no longer problems that are brand new. These are problems that many people have solved. I would say, reach out to the industry, listen to what the others are doing, and then use the best tool for what you need to.
Dhruba Borthakur: There are a lot of tools and sometimes people have to spend a lot of time and effort trying to figure out what is the best thing for them to use. Like you mentioned, we also see this trend where people are moving a lot of the analytics backend, or application development backend, from on-prem systems to the cloud. How do you think that this impacts retail, or any digital transformations that some big enterprises might be doing? How does this move to the cloud impact any of these application building or data building products that people are working on these days?
Kshitij Kumar: It's a huge impact. If I think back to last year in 2020, you saw all the retailers who were very tech savvy, and in some cases already in the cloud, we got very lucky that we were already in the cloud from the beginning. The exec board at Farfetch was pretty forward thinking, now that I look back at it, in bringing an entire team together as a data team, with me as the chief data officer, on the exec board of the company, a year before the pandemic happened. Not that we knew that anything was going to happen like this, but they were very forward thinking. What this allowed us to do was we got ready for that next phase, not knowing what was coming next, but we moved everything, not just into the cloud, everything was in the cloud, but we moved it into a much more streamlined way of working.
Kshitij Kumar: So that when the pandemic happened in March, April, last year, in 2020, we all went remote. For us, we didn't skip a beat. And Gio was talking earlier about how we ship to 190 countries. What we were able to do was, we also source from over 50 countries. We have over a thousand brands where we are getting these things from. Now imagine, this is a problem of shipping from 50 countries, and shipping into 190 countries, during the middle of a pandemic when all the pipelines are broken behind the scenes. You don't know how that shipping is actually going to happen; how this product is going to move. And so being in the cloud for us at that time was a huge advantage. Everything was flexible. You could grow up and grow down in different parts of the world as you need it.
Kshitij Kumar: And also on the data side, you could see the models beginning to change. The way that we would see certain things happen before that. The new normal was we saw models change in behavior, but we were able to rapidly retrain them with the new data that was coming in, and therefore be able to respond very quickly. So my firm belief in today's environment is that you need to be in the cloud if you're not already in the cloud, in the retail space. And I think we've seen that jump happen.
Kshitij Kumar: And people, when they go back to going into the stores, we're going to be both in the stores as well as in the online world. So it's going to be a combination. In London, for instance, we recently launched what we call "the luxury new retail" in which there's a store in Brook Street here called Browns, which is part of Farfetch. And if you go into that store, we just recently opened it, it's a combination of the online and the offline experience right now. And you've got magic mirrors. You've got people who are actually sitting with you and helping you find what works well for you.
Kshitij Kumar: But what this does is, all of that data that is being collected as all of these things are happening, it has to go into the cloud. It would almost not be possible to do this on bare metal and then transfer it around between places. That's one of the reasons why I feel, as we go back to a new normal, with online, offline, and the data impact across retail, I feel cloud is the place to start.
Gio Tropeano: Absolutely. In previous presentations, Kshitij, you've spoken a lot about data culture. For an engineering leader, a data leader, or team, building a data culture, what are the two to three main pieces of advice that you would provide them on being data-driven and building a data culture within their team, or even company?
Kshitij Kumar: Data is actually a cultural problem; it's not a technical problem in my mind. And why do I say that? You can have the best tools, you can have the best processes, but if people don't adopt them, if people don't actually use them, if you don't have the right people in the right way of thinking for using that data, then you're not going to get those benefits.
Kshitij Kumar: Why do we exist as a data team? We exist in order to help the business do what the business needs to do. In order to do that, we have to build the right tools, build the right processes. But on the culture side, we need to also educate people. So there's a lot of data literacy that needs to go into it. How do we make sure that people actually understand what is there? And what is the data there? If you need to, for instance, build a real-time use case. What's the data that's going to go into that? How are they going to know that that data exists? If something breaks, how do we know who to go to for that data? Is the lineage there already, the tools that are working with it, the analytics tools, the machine learning tools, the feedback.
Kshitij Kumar: All of this, it's not just about the tools, it's about how you share this with people, and get them involved, and let them play with it. I'll tell you that culture is one of our biggest focus areas in the data team today. And not culture from a point of view of just improving the culture, but how can you improve that culture in order to impact the business? And the way that we are doing it is including things like we are planning a road show, we're planning on exhibitions, virtual exhibitions.
Kshitij Kumar: We did a hackathon last year; we're going to do another hackathon this year. And doing that hackathon in early summer last year was such a tremendous experience. People had been in these remote locations for a few months, and people were not really meeting. The only people that you were meeting were people who were in your team, or you had to work with them, that's why you were meeting them on Zoom or whatever tool you use. And so for us at that time to force people to think out of the box and say, "Well, you've had a three month gap. Let's understand what has changed. What is the new data that you've got? What are the new tools that you've got?" We actually invited the vendors to come in and speak and give us presentations.
Kshitij Kumar: We forced people from different locations, who we knew would not be meeting each other face to face, into a team. So we would make a team where you had one analyst from Portugal, and one machine learning person from London, and then somebody else from India on the same team from a product perspective. And then put all of them together, and give them a chance to learn, to play together. And what came out of that was we got some 30 teams that came out 200+ people participated. A few of the things that they came up with were real life solutions to real life problems that we really needed to adapt.
Kshitij Kumar: I was really happy to see in the months after that, we would go into steer cones, and somebody would start talking about something, and I'd say, "Wait a minute, wasn't this one of the data hack ideas?" And they'd be like, "Yes! We took it from there, and now it's a real product." So I think culture needs to change, data culture is critical. As I said before, "Data is a culture problem." And it's really important for the leaders to think about how to do data literacy, how to do data governance, and how do we do better with this?
Gio Tropeano: It's super interesting. You, you see how the need to transform digitally is simply a change in human behavior. Versus absolutely throwing technology at it, it's the humans, and their use of that data, that actually is the key to making it work.
Gio Tropeano: Last question, Kshitij, and I appreciate your time being with us today. Just your response to this quote we recently heard from another guest on our podcast: "Real-time analytics is not easy." How would you respond to that, or what are your thoughts on that?
Kshitij Kumar: I would say good analytics is not easy, and real-time analytics is especially not easy, as part of that. I think the data journey is not an easy, simple journey to do. That's why we talked about being methodical, and picking the right tools and the right technologies, and working with the right people, who've done this before. But real-time analytics is especially not easy; it's especially hard I would say.
Kshitij Kumar: You have to first think about it. Where do I need it? And how am I going to do it? And then talk to people who've done this before, early on, so that you can actually be ready with the right solution for it. It's not easy, but it is possible. And with the right guidance, and the right tools, and the right experience, you can do it. And I would say, think about it as one critical part of a larger data strategy. And think about the whole strategy, not just this part of it. Yes, this is a critical part; think about the whole thing.
Gio Tropeano: Kshitij, thank so much for joining us today on our podcast, and sharing your sage advice and amazing insights. We really appreciate it. We know your time is valuable and we thank you again, and obviously best of luck.
Gio Tropeano: For our listeners, thank you for your time as well. And if you're looking for an amazing location to find designer clothes and offerings, check out farfetch.com. I spent some time on the website. They have some awesome offerings right in my wheelhouse. I think I might check out some of your Supreme beanies. It's kind of hard to find that in Central Massachusetts.
Gio Tropeano: It's been a pleasure. If you found this podcast insightful, please share it. If it's possible, comments as well, we'd love to hear your thoughts. The "Why Wait?" podcast is brought to you by Rockset. We at Rockset are building a real-time analytics cloud-based platform that can add value to the use cases that are similar to those discussed today. Check us out at Rockset.com. You can try us out for free for two weeks. We have a $300 trial credits as well, included in that trial. Subscribe to the podcast, comment if you can, thank you again, Kshitij and Dhruba, for joining us, and for all of you listening, and stay tuned for our next episode. Cheers, everyone.
Kshitij Kumar: Thank you, Gio. Thank you, Dhruba.
Dhruba Borthakur: Bye, thanks.
See Rockset in action
Real-time analytics at lightning speed