Why Wait: The Rise of Real-Time Analytics Podcast

Rockset Podcast Episode 7: Real Time Analytics for Data Applications

Modernizing Data stacks and becoming more data informed, listen to James Mayfield of Transform (previously Facebook and AirBnB) talk about enabling data applications with real-time analytics.

Listen & Subscribe

Other ways to listen:

Apple Podcasts

Pocket Casts

Google Podcasts

Spotify

Breaker

RSS Feed

The Why Wait? podcast where we invite engineers defining the future of data and analytics to share their stories. New episodes are published every month and you can subscribe to the podcast and roundup of blogs to catch the latest episodes.

Subscribe Now:

Show Notes

About This Podcast

Gio Tropeano:

And welcome to Why Wait? The Rise of Real-Time Analytics Podcast brought to you by the team here at Rockset. We invite business leaders, app development thought leaders and analytics specialists to share their stories with the world, providing insights into what your peers are doing to improve data and application analytics here. I'm Gio Tropeano of Rockset. I'm here with my cohost Dhruba Borthakur today, and he's the esteemed co-founder and CTO of Rockset, the mastermind behind Hadoop with a history that spans the early days at Yahoo and the growth phase at Facebook engineering. Welcome Dhruba, thanks for being here. Before I introduce our special guests, if you're listening to this podcast and you have a question or if you'd like to comment, please do so on our community slack channel at Rockset-community.slack.com or feel free to tweet at us @RocksetCloud. We'd love to hear from you.

Gio Tropeano:

So Transform has been building a metrics repository with the goal of being the single source of truth for your company's most important data. James Mayfield is the co-founder of Transformdata.io. Prior to founding transform, he was the director of product management for infrastructure and data at Airbnb, where he modernized their data stack and made the company more data informed. Prior to Airbnb, James is at Facebook for seven years with most of that time being spent as a data analyst and product manager for data tools and infrastructure. James, thanks so much for joining us on our broadcast today and on our podcast and welcome.

James Mayfield:

Thanks. Thanks Gio. Happy to be here. Dhruba, good to see you.

Dhruba Borthakur:

Yeah. Great thing. Thanks for being here.

Gio Tropeano:

Old friends getting back together, it's always a beautiful thing. So James, you have a ton of experience dealing with apps that are powered by large data sets at Facebook and then Airbnb. So tell us some of the applications that are enabled by real-time data, what kind of applications are they?

James Mayfield:

Yeah, great question. Certainly have seen the full spectrum of things across my time and experience. I think one quote that always sticks out to me when I start talking about real-time analytics is that a batch is just a special case of streaming, right? That it is almost like a happenstance that so many companies have gone with a daily aggregation model, but really a daily aggregation is just a point in time across the stream. You cut it, you bundle it up and send it across. And that day is a standard way of doing it but really any sort of batch could be a day, an hour, 15 minutes, one minute, five seconds, one second, less than one second. Any of these things are just some sort of aggregation on top of the stream that is constantly flowing.

James Mayfield:

When I think about applications of like why do people even collect data in the first place? What do they need to do? And then at the heart of my experience is people attempting to make business decisions based off of data, right? They want to be informing their decisions. If every employee is paid to make decisions, you want to inform those decisions with data otherwise people will just make gut informed decisions which can work out, but really having that data to back up their thought process and to justify what it is they're doing is really important. When I started thinking more about real-time analytics in the context of this podcast, I thought about human decisions that are made quickly, right? I kind of separated the ML world and machines talking to machines from what are real-time analytics meaning? What is a question that gets answered by a human being and then a decision is made by a human being informed by data on a very low latency.

James Mayfield:

A couple of things that came to mind right off the top of my head here, I'm just looking back at my notes. We had a big one in the advertising space or marketing space, right? There's millions and millions of dollars at play in big advertising campaigns. Human beings have created the collateral for those advertisements. Human beings have thought about the messaging, they've thought about the audience. And when they go to deploy those things, they want to have a sense of security that, hey, I just did this right. I just sent the right message to the right person at the right time to help influence them in some way. And the more analytics that they can have saying here's where my campaign dollars are going, here's where my advertising dollars are going, here's where my marketing dollars are going, the more surefooted they feel with spending those dollars.

James Mayfield:

And so this was something I saw at Facebook when I briefly worked on the ads team and then I saw Airbnb when we helped to create the datasets for the marketing and advertising efforts, which was empower human beings to deploy capital. Empower those human beings with real-time or near real-time analytics so that they can gut check that those dollars are being spent correctly. And if heaven's forbid something goes wrong, they can immediately cut that off, get in touch with whoever they're spending money with and stop that outflow of capital. And really a huge way to do that is to real-time analytics. You can have a lot of automated safety checks and things like that but at the end of the day, a person who has created the collateral and is targeted an audience is going to know best whether that is working correctly. So that was a big one that came to mind.

James Mayfield:

Another interesting one from Airbnb in the real-time analytics space was Airbnb had this unique place where we facilitated an online relationship that became an offline relationship. You put two people in touch in a virtual environment and then those people actually meet in real life. And that's a unique challenge and there's all sorts of things that can come about in that transition from online to offline, some of which are marvelous and wonderful and sort of magical experiences. And some of which can be frustrating or scary or in need of assistance and support. So when we thought about real-time analytics, in that regard, we had large numbers of support representatives at Airbnb. They would be tasked with helping to facilitate those online to offline and those real world interactions. Things would happen. We would need to staff up different teams or we call them queues. Different queues of support tickets would come in or phone calls would come in and having real-time analytics was a great way to identify could we actually roll people from one focus area to another, right?

James Mayfield:

If we had just relied off of batch processing for that, you can imagine how much slower decision-making would have been. Right? If we just saw, okay, yesterday we had 100 tickets come in and so we're going to staff this one team with 10 representatives who do 10 tickets each. Okay, perfect. Yep. But tomorrow, if the ticket queue spikes to 150, now you're going to be well well behind. And so tightening that down from a day to an hour to 15 minutes to five minutes, these, these incremental sort of improvements in the real-time nature of that data set helped to allow those support teams in particular to staff appropriately across the globe. Right?

James Mayfield:

We had people traveling across the globe. We had people supporting them across the globe. It gets even more interesting when trying to do that real-time analysis on what language does this person speak on the phone? Or what language does this person speak who wrote the ticket? Or maybe the person traveling is from Italy, but they're going to Germany. And so do you want to get the host on the phone with someone who speaks German? These things are actually quite nuanced and you can't look at just a top line number of tickets or support representatives on a daily cadence to make that work. You have to have much more fine-grained analysis. You have to have many more dimensions that you can slice that by in order to really effectively sort of like staff an organization in that way.

James Mayfield:

The third big one that popped out was experimentation. Experimentation is something that we did a ton of at Facebook. A ton of Airbnb, I think of experimentation as almost being a little bit similar to that advertising and marketing category, where a piece of me just wants to make sure that nothing is off the rails, right? Whether it's advertising dollars going off the rails or whether it's an experiment going off the rails. I think an experimentation too when you have slow roll-outs or you have small sample sizes. The example that I could give is imagine that you're rolling out a new search functionality. You only roll it to 1% of your population who are doing searches. Well, if for some reason that search functionality doesn't work for half the people, now you've lost half of 1% of your total audience are unable to complete a search.

James Mayfield:

Your top level alert is not going to trigger for that. Even if you have the best Datadogs set up in the world, it's never going to actually say, oh, this person, this particular slice of treatment group experienced some problem. That will not be statistically significant or cross any sort of threshold boundaries that you've set up in your OLTP systems or your monitoring systems. But if you have real-time analytics that can provide that fine grain nature of analysis and you can do it on a quick basis, you're going to be able to identify that there was an experimentation issue with this particular experiment for this treatment group, or maybe even a select slice of that treatment group. It's only people who have German buttons appearing on the screen for the search and the German characters that like extended out past the button threshold or something like this.

James Mayfield:

With real time analytics, you'll have a chance at diagnosing that. With a top level sort of monitoring system, you will not. So that was another one that came to mind. And then the last one, and I won't go super deep on this. I think that there's always sensitivity around these subjects, but fraud, fraud prevention, information security and sort of defense in depth against attacks, the constant sort of threats that are out there. The bigger your company, the bigger your platform, the more opportunity there is for nefarious activity. And I think that real-time analytics were a game changer in some ways for the analysis that we can do for finding those small but important micro activities that are occurring on the platform. And so real-time analytics was a part of all of those things in my last two jobs.

Dhruba Borthakur:

Yeah. Thanks for explaining the different ways the real time analytics was useful for the places where you have worked. Let's talk about a little bit about the experimentation platform that you kind of explained, right? Because you and I worked together in some of those platforms back at Facebook. They're the focus from what you explained is that it's great if we run experiments, but it's also great to be able to find results quickly. So that'd be half of 1% people are not getting the search results that might suffer a bad experience. Right? So fresh data is definitely important to be able to make those decisions and not wait for many days or many hours to figure it out that an experiment is going bad. So tell me a little bit about some of the data stacks that you used to get these kinds of real-time analytics systems in place and running in production at large scale.

James Mayfield:

Yeah. So the data stack, I think I'll talk about Airbnb. That's the one that's most sort of fresh in my mind even though I'm already kind of a dinosaur now that I've been gone for a little while. I came from Facebook, obviously I was highly inspired by what we built at Facebook, came to Airbnb and saw very early, did an ecosystem that was immature in some ways. The approach that we took was to get onto the most standard sort of infrastructure set up that we could, right? One that was well understood, it was well-supported, there was good documentation. There were people we could call when something broke. And of course I had a personal slant towards loving some of this stuff I had used at Facebook and so I think I helped to influence that to move to that. Hadoop, HDFS, everything ran on AWS.

James Mayfield:

And then we used kind of a standard HDFS storage system. We ran Hive, First, Presto and then started layering on additional things like spark streaming. I think finally HBase and Flink came into the mix, but really it was about laying that super solid foundation at the beginning, talking through experimentation. It started off as only batch jobs. Right? And it was like the first implementation of all these things, I like to follow that. Crawl then walk then run pattern. And so crawl was like, hey, can we store data? Can we query it? Can we do that on a daily basis? And then can we use that for experimentation results on a daily basis? And we didn't have much sophistication in terms of like catching errors or real-time mess.

James Mayfield:

And then over time as we really solidified that platform and that base use cases that we needed to support. From there, we began to layer on additional complexity and try to tailor those systems to additional use cases. So we used, like I said, everything was on AWS. We went with an HDFS file system, right? It was a cloud era based deployment. We contracted them to help. We use the CDH sort of open-source thing first and then we ended up partnering with them, signing a contract and layering on some of the great tools that they offered at the time. Gosh, I think it was clutter manager was I think probably the one that really stood out to us as being very important for being able to monitor this large system.

James Mayfield:

This is years ago, gosh, it was like 2015, 2016 now I'm talking. But the thing that we found when we wanted to move into those more near real-time use cases was we had issues use when testing out the raw S3 file system as the backing mechanism for some of our storage. They call them data lakes now. I think that's kind of the term that's been brought up. Originally, we were using these HTFS nodes then we tried to test S3 nodes and we found that we actually had better performance with streaming data into the HDFS nodes themselves and then being able to run queries like HBase queries on top of that, that the Hive meta store was more performant that the response time was more performant. I can't remember the details of all of those tests, it was like an order of magnitude when it came to trying to store data in S3 natively and then pull it back out using HBase.

James Mayfield:

And so we ended up keeping a bunch of data there in HDFS and having to manage those nodes, which was challenging, but was sort of a prerequisite for dealing with those more near real-time datasets. Does that answer your question? Do you want to go deep on that?

Dhruba Borthakur:

No, no, absolutely. I think I remember in the very early days at Facebook as well, we used HDFS a lot to be able to stream data and write it into HDFS. And I guess you kind of use some of the similar techniques at Airbnb as well and also complimented it using S3. So you explained to me about how data comes into HDFS or S3 and gets streaming through Flink and other mechanisms. Could you also tell us a little bit about how data gets out of the system? Once you deposited in HDFS already deposited in S3 and then you run some Flink jobs, where does this data actually get pushed out to? And you mentioned HBase so maybe that was one way to push it out because people could query HBase. Is there any other things that you use there to get the insights and information back out to your applications?

James Mayfield:

Yeah, that's a good question. Yeah. So the connectors that like... I think of this is almost like an input output, right? Where were the decisions being made, right? Who was making those decisions and then can you get them data to a place that empowers that decision? So it was funny. Internal at Airbnb, we had built stuff ourselves and so we had kind of like control over this in a sense, which was great. I'm realizing now that, I know we'll talk about this in a moment, but there's kind of a wild, wild west. Like all these companies do this very, very differently but-

Dhruba Borthakur:

There's no standard way to do it looks like, right? There wasn't like one way to do it. Everybody was trying to use it as how the pleased or how they'd like to do it.

James Mayfield:

Right. Right. Yeah. Yeah. And I think when you're within a company, you have a little bit of control and then at least you don't have this huge constellation of things that you have to support. You're like, okay, we know where this data is going to end up and let's build a pipe. And it just has to be one pipe to one application, which is good. So all those applications like our experimentation platform, we built that. And so we could figure out what were the slices of data on what latency did we need to track. On the customer support tooling, we had built some of that too. Some of it we had from vendors and we had actually taken and coupled some tools that would have vendor data streams and then our own data streams all within one interface. Right?

James Mayfield:

So when it came to querying these things, there was a human actually sitting behind a terminal and querying using SQL, using Presto, using whatever it was. There were machines querying this. You mentioned Flink. Flink is actually so fast that we found we could use it in some of our ML and like roll things up really, really quickly and then have machines like ML models take advantage of those summary, those micro batch aggregations that we could do. One example is like, gosh, if you're looking for fraud like how many messages did this person send? Well, that number is going to be incrementing really, really quickly. If you just did that as a batch at the end of the day, this profile might only say, oh, they only contacted five people. By 5:00 PM the next day, they contacted 5,000 people. Right? You want to know that in near real time as you sort of want to make a decision on that. So there were some like actual user interfaces on the customer support side where we would route a particular metric, depending on the latency that was required.

James Mayfield:

There was an API that we would create basically on the storage and compute side, that API would be available to the front end interface. We owned all that tooling. We could fetch, we could write that API query, we could resolve it down to whatever type of query or stream or whatever it was on the backend. That was one way we did it. And then we were big on our own internal tooling. We built Superset and open-sourced that. Now Preset is the name of the company that runs Superset. And so we could even have those direct connections to not just like a Hive query, not just a Presto query, but then if they were actual other queries, I think we used the Apache Phoenix at some point to wrap like HBase in a SQL like interface.

James Mayfield:

And so those are kind of the ways that we did it. Dhruba, I'll be totally honest. I don't recall if we had any incremental storage between the different layers. I think we tried to handle as much of it as we could through API calls that would resolve to two queries on the backend. I'm sure that we had to write some of it out to different internal production systems like my SQL boxes or Postgres installation but it escapes me right now exactly what we did in that regard.

Gio Tropeano:

So switching gears real quick James. Tell us a little bit about what you're actually building now at Transform and how real-time analytics plays a part in your product and your day-to-day.

James Mayfield:

Yeah. Yeah. Thanks for that. So Transform, we're really focused on this concept of metrics and our core hypothesis is that most companies, most people at companies, they care about the metric, right? And when we say metric, it's like the key performance indicator or the goal target or in the OKR sentence, it's the key result attached to an objective. And that is the language that most people use when they talk about data at a company. That's kind of our core hypothesis that we've seen this incredible growth in the size and complexity of the data warehouses and data lake environments. We've seen an incredible increase in the skillset of people dealing with data. It used to be that analysts like me would just spend time in Excel, maybe I would go click around in a BI tool. The modern analysts now, 10 years later, they write SQL natively. I mean every analyst writes SQL. And almost all technical analysts and data scientists, they write Python and they use Jupiter Notebooks and they write Airflow jobs and they create data assets.

James Mayfield:

And so you got this big technology shift, low cost storage, low cost processing. You've got a people shift and more technical people. And what we found is that many companies now, even if they have a good infrastructure set up, they really struggle with understanding what are the metrics that we care about? What do they mean? Who owns them? What has happened historically? What are the annotations that people should know about? How many times have people sat in a meeting and there's a graph that goes flat and it has a big spike and no one can explain that spike? This is something that happens all day long and our core hypothesis that transform is that we could refocus some of the attention of a company onto metrics and really make data accessible to the less technical people to be able to grab those metrics and not have to go bug an analyst every time they need an answer to a very basic question.

James Mayfield:

So that's kind of the wedge that we're thinking about and working on. At the end of the day, we have a metrics framework that accepts standard languages, YAML and SQL configuration files that say, here's the metric that I care about. Here's how I defined it. I'm the owner of it. Here's now all the dimensions that it can be sliced by. And we create data sources for these things and we have an engine on the backend that will efficiently sort of take a metric and slice it by dimension. It takes a lot of the guesswork and the tribal knowledge out of the data warehouse environment, which can be really complicated and confusing and helps to just simplify and streamline the access to metrics. And then we have a bunch of APIs off of that that can connect to any BI tool, any experimentation tool, any anomaly detection tool.

James Mayfield:

And so that's what we're working on. I think when it comes to real-time analytics, most of our customers, as I mentioned at the very beginning, feel comfortable in that daily batch aggregation, but because our interface is really SQL at the end of the day, we would be accepting of any SQL query. If it's a daily batch job, great. If it's a one and a half second latency Flink job, great. As long as we can write a SQL query through our framework, we'll be able to execute that query and update the data and make sure that people can then have metrics in real time as well. We haven't explored this deeply with any of our early sort of like beta customers, but it is something I get excited about and I think I've already pinged Dhruba on LinkedIn and started to ask him about that a bit. So I think it'll be fun to experience those use cases where people need metrics, consistent metric definitions and need them on a very low latency basis.

Gio Tropeano:

Yeah, it reminds me of one of the sayings where a couple years back it became pretty prominent in tech where every company is a software company, right? That's kind of made its rounds, right? I feel like moving forward, we're on the cusp of that changing and transforming to every company is going to be a data company moving forward. Right? And having the ability, providing the teams that had access to the data the ability to really slice it, dice it, make use of the data is going to be even more paramount now than it ever was before, especially with the mountains of data that companies can manage and access. You've been both at Facebook and Airbnb, which are B2C companies, right? And now at Transform, you're building data products for B2B. So would you mind telling us a little bit about kind of what the differences are that you've seen in building B2C and B2B when it comes to real-time analytics?

James Mayfield:

Yeah. Good question. So I'll say first of all, when I was working internally at companies, right? When I was at Facebook as an internal product manager building data tools and data applications and building tooling that allowed for easier access to our data assets at the company, I had internal customers. They didn't pay me, the company paid me. And I have to say that those customers are some of my most harsh critics. And it was really powerful to be sitting next to someone and every time we did a small release of a new tool or a new feature within a tool, I had very honest, direct feedback from consumers internally who would just come to me and say, yep, this is what's going on. This is what works. This is what doesn't. So that was very powerful.

James Mayfield:

I think when you have this wall of your company versus another company, you're a vendor and they're a customer, it actually creates a little bit of distance I think that I wasn't familiar with internally. And I think it creates a little bit of a hoop that you have to jump through to get that really good unfiltered feedback. And that's like an observation I would have right off the bat. The other thing I've noticed when I'm dealing with actually selling data products, I have to support a huge array of data ecosystems, right? I call it like the constellation of setups. Different teams have invested in different products. They have vendors, they have things that they built that they love and are kind of. Even sometimes people internal at that company has staked part of their reputation on building something. And so that becomes something that's ingrained and that you have to interact with.

James Mayfield:

Different companies have different systems, oftentimes many systems for storing data and computing data and getting access to data. Larger organizations in particular, they have a lot of legacy things. They might have 10 or 15 different storage systems, each one of which has a slightly different access pattern or data governance rule on top of it. And so you have to integrate with all those things. That's interesting. Yeah. And I think when it comes to real-time analytics, we found that customers have different data types stored in different locations, processed in different ways, that they want to retrieve in different levels of freshness. Right? And so being in that environment, what are the things that we can build that will help to satisfy all those different use cases and be flexible? I think that we found recently that, and again, maybe this wasn't true 10 years ago, but recently it really feels like SQL is the language that almost all companies have rallied around and almost all analysts and data scientists have rallied around as sort of the interface to their data storage and compute systems.

James Mayfield:

So our approach and the thing that we've learned is let's make sure to support SQL really, really well. And let's make sure that we understand the nuances of the different SQL dialects. Everyone says they're like ANSI SQL, no one actually is a different database. Snowflake has a different thing with quoting and Databricks has a different thing with backslashes or whatever it happens to be. We have to really get in there and understand what are those nuances and speak that SQL dialect effectively. When it comes to real-time analytics, I think it's actually the same.

James Mayfield:

I mentioned Apache Phoenix being a wrapper on top of Hbase but what is the structure of a spark streaming query versus what is the structure of an HBase query using Apache Phoenix? These are things that we're actively learning and making sure that we can support because these customers come to us and they say, oh, my support team needs data on a five-second interval. Well, we have to be able to support that and route it to them effectively which involves supporting that constellation, rallying around SQL, which is what our customers have rallied around and then just being flexible and empathetic to those use cases and making sure that we can support them.

Dhruba Borthakur:

Yeah. I think you explained many different things about the differences between how real-time analytics being used in large company versus maybe a smaller size company. Can you tell us a little bit, again, biggest differences of how a real-time analytics system or data from a real demand analytics system might be used in a very large company versus a startup or a growing company. Take for example, you've talked about experimentation earlier, right? How important it is for a smaller size company to do fast experimentation on real data versus large companies that might have a little bit different priorities. So what are those things that you've experienced from your personal experiences about the differences? Or what is to be preferred more and what is the preferred less by different size of company as far as the real-time analytics is concerned?

James Mayfield:

Yeah, that's a good question. The first thing that jumped off to me was bigger companies really care about data governance and data access patterns. Bigger companies are oftentimes like an actual umbrella of different concerns, or maybe they've even bought five different companies that are all like silos within this big umbrella. And they care so much about IT, RBAC, single sign on Okta, SAML. All these things are just so, so, so important. You got to get that right for these bigger companies. Smaller companies, they don't care. They've got 10 people and like everyone has access to everything. So that's like something that jumps off the page right away.

James Mayfield:

I think at larger organizations, the problems that I see are problems of silos often. Somebody says I want real-time analytics and I want to be making this decision. And when you unpack that decision, there's actually three different pieces of data that you need from three different systems. They're all... One is like some crazy Oracle legacy thing. One thing is like some super modern cloud native like Flink application and one is like, I don't know, an Excel spreadsheet from somebody's laptop and like you have to merge these things together in an effective way.

James Mayfield:

And I think that that becomes the tricky part of these bigger organizations. They've accumulated a lot of legacy stuff. They accumulated in some cases, a lot of tech debt and unpacking what it is that someone really needs, helping to facilitate the fast access, the fast collection and then fast access of that data can be a real challenge. At smaller organizations, I've found that oftentimes they're just trying to figure out if things work. A bigger organization is trying to like optimize and optimize and optimize and optimize and they have this big hole in the ground that's like spewing money and they want to make sure the hole doesn't get smaller and then try to just incrementally make that hole a little bigger.

James Mayfield:

And then the smaller companies, oftentimes they're just searching for product market fit. They're just trying to make sure that things aren't falling apart. And so when they're thinking about real-time analytics, oftentimes they're not thinking about optimizations and optimizations. They're like, how can I just make sure this thing isn't broken? How can I make sure that the customer experience that I think I'm providing, I'm actually providing? And so that becomes more interesting to those smaller companies. Access is generally not a problem. They got a couple systems to give access to everything, to everyone. And so that's a big difference that I've witnessed just in my short time sort of interacting with those two different org types.

Gio Tropeano:

Big differences. That's for sure. Last question, really, James, if you can give one last piece of advice to data and engineering leaders and builders on real-time analytics, what would it be and why?

James Mayfield:

Yeah, that's a good question. I think I fall back on like my product manager roots here a little bit where I've been blessed in my career to be surrounded by incredibly smart engineers. People who got really, really, really excited about hard technology problems and solve them in novel ways and pushed the capability of computers beyond sort of what anyone had seen before, which is... It's really an honor to be around that. The value proposition that I always tried to add in for my product manager seat was not trying to be the most technical and the most sort of hard-nosed deep thinker on how to optimize different systems or make things possible technically, it was really to bring that empathy to the table and try to see can we get incredibly clear about how does it...

James Mayfield:

Like a potential customer of our data systems, what is their workflow? What is it that they need to do? What are the decisions that they make? What are they even incentivized by, right? Are they incentivized by answering more customer support tickets? Are they incentivized by optimizing a campaign and saving $50,000 of bad spend? Are they incentivized by finding an experiment result and ending the experiment or rolling it out further, if it's good or bad? Figuring out those incentives, figuring out and building that really deep empathy for what someone is trying to achieve in their role, figuring out how data could make that possible. That was always my special value add. And it took sometimes hours or days sitting with people and just observing and trying to figure it out what they were trying to do to then take that back as a spec and say, look, you're indeed a leader, there's some things that are just table stakes. You got to have compute, you got to have storage. You got to have just like the nuts and bolts to do basic business reporting. You got to do that.

James Mayfield:

But then when you can pick off and say there's a unique thing we can do with data to make this company even more valuable than it was yesterday, what is that thing? At Airbnb, maybe it's the connection of hosts and travelers, right? If you can make that connection better, you're going to increase the thing that Airbnb is really good at and you're going to make it great and you're going to actually increase the business prospects of that thing. Finding that, finding out how to do that, finding out what it is that the people do who are optimizing that and then supplying them with a data solution to their problem.

James Mayfield:

The thing that I've seen some companies get stuck with and some data organizations get stuck with is they love the tech and then they try to build the tech and then find a use case for it. Not always bad. Sometimes you open up some pretty phenomenal use cases like that. But if I was going to give one piece of advice to data engine leaders, it's deeply empathize with the thing that people are trying to do with the consumers of your products, what are they trying to do and then make that possible and make it great and delight them beyond what they ever thought was was possible for them.

Gio Tropeano:

I love it. Great advice and super insightful today. James, thank you so much for joining us today. I think it was, like I mentioned insightful fun. We'd love to actually have you back as you propel Transform into the future. I'm looking forward to learning more about how things progress and some of the challenges that you overcome with real-time analytics in the future. So thanks again and maybe Dhruba-

James Mayfield:

Yeah, thanks for having me. It was great to see you again Dhruba and Gio, it has been great getting to know you as we set this up, so thank you all.

Gio Tropeano:

Yeah, absolutely.

James Mayfield:

Thanks James. Thanks a lot. Good chatting with you.

Gio Tropeano:

All right. And to our listeners, thank you for listening to the episode. You can chat more with James's team by visiting transformdata.io. If you found this insightful, please share the episode, right? You have the opportunity to share these highlights with the world. The Why Wait podcast is brought to you by Rockset. We at Rockset are building a real-time analytics cloud-based platform that can add value to some of the use cases that are similar to those that were discussed today. So check us out at rockset.com. You can try us there for free. Subscribe and comment and thank you again for joining us and stay tuned for our next episode. Cheers.

More from Rockset

Podcast

Rockset Podcast Episode 3: Why Wait? The Rise of Real-Time Analytics

Podcast

Rockset Podcast Episode 4: Why Wait? The Rise of Real-Time Analytics

Podcast

Rockset Podcast Episode 5: Why Wait? The Rise of Real-Time Analytics

Podcast

Rockset Podcast Episode 6: Real Time Analytics in Cybersecurity