Why Wait: The Rise of Real-Time Analytics Podcast

Rockset Podcast Episode 11: Varying Degrees of Real-Time Analytics

What action are you taking from the data? This is the central question that Michal Tricot asks teams to guide their latency requirements. At Airbyte, Michal is giving companies the choice around how real-time they want their data, regardless of the data source, and making it easy to access data for analytics.

Listen & Subscribe

The Why Wait? podcast where we invite engineers defining the future of data and analytics to share their stories. New episodes are published every month and you can subscribe to the podcast and roundup of blogs to catch the latest episodes.

Subscribe Now:

Show Notes

Gio Tropeano:

Welcome to the Real-Time analytics podcast Why Wait? The Rise of Real-Time Analytics hosted by Rockset. In our podcast we invite engineering, business thought leaders and analytics specialists to talk about their world. Providing insights into what your peers are doing to improve data and application analytics. I'm your host Gio Tropeano of Rockset. I'm here with my cohost Dhruba Borthakur, the co-founder and CTO of Rockset. Dhruba, always a pleasure to have you on the podcast. How's San Mateo today?

Dhruba Borthakur:

Oh, doing great. Thank you. Thanks for asking. It's great to be here.

Gio Tropeano:

Thanks for being here. Before I introduce our esteemed guest today. If you do have any questions or comments on today's podcast, drop us a line we'd love to hear from you. You can either tweet at us @rocksetcloud on Twitter, or feel free to join our community at community.rockset.com. Our guest on the podcast today is Mr. Michel Tricot, co-founder of Airbyte. Airbyte is an open-source data integration platform that syncs data from applications, APIs, and databases to data warehouses, lakes, and other destinations. They're based in San Francisco, but more about Airbyte can be found at airbyte.io. Prior to founding Airbyte, Michel founded and was head engineer at rideOS and LiveRamp. And he was also an engineer at Rapleaf, Murex, and FactSet. Michel, welcome to the podcast. We're super excited to learn from you today.

Michel Tricot:

Thank you very much for having me.

Gio Tropeano:

Absolutely. Before we jump into real-time analytics, tell me about why you founded Airbyte and what you're working on today.

Michel Tricot:

Yeah. I've been in the data space for the past 15 years and over all the different experiences that I had, I realized that every single company is rebuilding the same time of data integration and same time of project to tackle all their data movement issues. And over the past five years what we've seen with the new modern data warehouses is that there are teams assigned to shift how they are looking at data and moving more from an ETL priding to an ELT priding. And at that point it makes it very, very doable to start having commonly use connectors that company can share that connector that can be shared across different companies. And when we realized that that's why John and I, my co-founder we started, we started Airbyte is what can we do to address this data integration once and for all and ensure that companies don't repeat what other companies are doing? And it can actually share the burden of maintaining this, this integration and facilitate all their data movements.

Gio Tropeano:

Awesome. Airbyte's focus basically get data pipelines running quickly.

Michel Tricot:

Yep, Exactly. Very simple.

Gio Tropeano:

What are you hearing from your customers about their needs for real-time data?

Michel Tricot:

Yeah, that is interesting because when we build a bite, what we have in mind is really a micro batches because every company has a different definition on what real time means. Whenever they ask us about can Airbyte do real-time, we always ask them like, okay, what is your definition of real time? Is it micro second? Or is it like the, seven, is it the immediate five minutes? What is the interval that you consider to be real time for your business? And in general, you realize that you have use cases where you need very, very strict sub seven reactivity from your system. And the one we've seen are generally around like finance around fraud detection, things where you need to very, very quickly react to these new signals. Now, the thing about analytics. Yeah. So I would say that's the type of use cases that we have been exposed to with our byte. Now with try to tackle this type of real time use cases by having very, very shocked live micro batches, to get data into warehouses. That's how we've been approaching real time for our existing customers.

Gio Tropeano:

Awesome. One of the things that we love to do is just share expertise and advice with our peers through this podcast. What real data advice would you give to engineering leaders that are embarking on an analytics initiative? Whether it be a company that's looking to build it into their application or dashboards, what are some of the pieces of advice that you would give?

Michel Tricot:

I would tell them, make sure you separate what is your analytical use scale analytics use case versus your operational use case, because one will very likely require real time, which should like subsequent or some minutes latency from your data. The other one, it depends on, for example, for analytics, it really depends on what action do you take from analytics? Like if you get an information within the signal, is it a human being with looking at the dashboard that was ring doing BI on top of this data?

Michel Tricot:

Or is it a computer behind that is going to take action? And if it's a human subsidy, it's not going to be useful for you. And you will be, you will have to pay a lot of technical complexity in building this type of real-time rates. Now, if it's a machine that is looking at it, that's very different story. And it's really, think about your use case and what you're trying to solve, because you can very often get by with increasingly a little bit the latency and what you get behind the scene in terms of the simplicity of your system. You're going to be happy for the next few years.

Gio Tropeano:

I feel like real-time analytics is now where the cloud was like 15 years ago, right? Where the cloud, like moving to the cloud or utilizing the cloud meant something different to every single, you know, company leader and IT head that was looking to leverage the cloud. It's a really interesting place to be and hearing from you in this case. We've heard from various engineering teams that running real-time analytics software stack needs a lot of engineering resources for maintenance. What experience can you share with us around this topic?

Michel Tricot:

Yeah, so I would say the place where I've been the most exposed to real time, I mean, that was both libraries and railways labor, and it was pixel traffic gaining access to people, browsing online and getting all this information about people that were browsing our main. We had to build something that has real time. Now, we were making operational decisions based on that, but for the analytics, we are giving ourselves the time. So we would have like batches, every 30 or 40 minutes running to pull the data. It would be Burford somewhere in between when we receive it. And when it actually being loaded into our data warehouse and ready, it was a bit similar, except that this one was more in the IOT space. It was about self-driving cars, collecting sensor data, collecting map data, traffic data. So first routing challenges as well.

Michel Tricot:

This one was more complex because we did not provide any type of analytics at the time. It was purely the operational use case, but it was a complex system because suddenly you cannot go down when you're doing real time. How do you, you need to sync a lot more than when you're doing like micro batches of batches, where if you have done [inaudible 00:08:17] for us, but when, what time wants to go down, you have a chunk of data that we are evolved to have access to.

Gio Tropeano:

Hmm. Then why is it important for modern organizations to invest in real time analytics, in 2021? Why should they be considering it?

Michel Tricot:

It's you can make faster decisions. Now, the question is, can your business do this a faster decision with this new system that is powering all these additional insights? It should be really driven by what kind of business decision you need to make with this information, or what can, how does that improve your product to be a real time, like actually real time? It is, it should not be seen as an engineering press to do real time, because if you don't need real time for your business, then you're going to be paying a very high price and doing real time. But yeah, that's, I would say, think really carefully about what you can do that require subsequent seven minutes, sub five minutes and what you cannot do if you don't have the system.

Gio Tropeano:

I don't want to say we're in the infancy of real-time analytics. I think we're moving into the up-ramp of people using it more and more. Especially with the dawn of data applications and real time, just data needs. Why don't current real-time analytics solutions work? What are you seeing out there that's causing some solutions to not really hit the mark?

Michel Tricot:

I think here you can do a parallel with how batch processing as well, so far, which is we used to have had to do analytics. We used to, and all the technology that was built on top of it, that was very complex, lots of building blocks that you would have to put together to get to your batch analytics. And what has happened is the market has evolved to technology that make that simpler to use and make it so that it's not just data engineers and very, very technical people who can actually use it because, and was real-time.

Michel Tricot:

I think the same thing is going to happen. We're still at all these different building blocks that you need to put together on a unit to have Kafka you need to have purpose. You need to have like a lot of, yeah, there is a lot of prerequisite to have a walking real time system. And I think at that point, it's just a matter of like the technology evolving to become simpler so that you can get more options. And the more options you have, the more use cases you will discover as well. And the more important real thing will be gone for organizations.

Dhruba Borthakur:

Yeah. You've touched upon a good point. They're saying about how it's becoming easier. It looks like for companies or enterprises to use more of real-time analytics, you mentioned CAFCA there. And I think that's definitely where a lot of kind of data in transit happens, right? As far as Airbyte, are your personal experiences concerned? Are there any special things that Airbyte does our can do far powering these kinds of real time analytics either now are in the future?

Michel Tricot:

Yeah, and yes, but, and that will always depend on what is your definition of real time. So today, imagine that you have a warehouse that can support very fast updates, very fast inserts, et cetera, et cetera. It's a matter of how quickly can Airbyte get you the data so that then you can have these data available in your system. And we do, we, you know, we can do sub seven minutes incremental replication. It's not going to be the seven minute segment or the subsequent real time, but you're getting very close to a system that is always up to date. And worst case is you have this one or two minutes latency on your data. That is how we're approaching real time from the Airbyte standpoint and how we're helping our users is this type of problematics.

Dhruba Borthakur:

Yeah. No, that makes sense. I think what you were saying is that real, Airbyte can be used to push that up to say a warehouse and you are kind of streaming data at high or higher right rates to warehouses, but do you have any anecdotal, evidence are like stories to share about how much kind of real-time rights can you do to some of the standard warehouses that some of your customers might be running? Is it mostly like most of the warehouses typically talk about like trickling data in and loading data in batches. How does it match with when data is streaming in at very high right rates?

Michel Tricot:

And yeah. We do what we can do, meaning we actually batch and buffer everything before it's being loaded into warehouses. So that's why you have to have that latency. It is not possible today to just trim directly. Now we know that yeah. Technologies exist to do it. And, but yeah, it's a matter of like integrating with that type of technologies.

Gio Tropeano:

From your perspective, what would you hope that engineering leaders know about the modern data stack?

Michel Tricot:

I think the modern data stacks allows your data team to become a lot more agile. And you need to think about like is your team spending the time, the time on the rise, because you don't want your team to be writing data integration. You want them to be managing, you don't want them to be managing, and you want them to be making the data better. So that organization consumer can actually get the best out of it. It's more about what can you put in place to invert the responsibility of the consumer of the data and becoming a platform for the rest of your organization.

Michel Tricot:

And I think that's what the modern data stack is about. It's pick the best of breed in every single category, like our bipolar ingestion, somewhere house in the middle, or DVT for transformation, create expectation for quality assemble all of that together. And then, because you have a simple system on which you can query and work with the data, just give that responsibility to other people will be able to dig a lot deeper into the insight that they want to get. It's really about becoming a platform for your organization.

Gio Tropeano:

To be able to actually leverage the data even more easily, versus just spending all the time, building the tools or cobbling them together to work together. Sure.

Michel Tricot:

Exactly.

Gio Tropeano:

Super insightful.

Michel Tricot:

Yeah. And you know, like you have a lot of new demos that are coming up and some of them are going to be extreme, a lot less technical, I think it's, but they will be extremely good at figuring out what insights they need to help the business. So if you can enable these people to make decisions and to get this insight themselves, your organization is going to be doing a lot better and you're going to be a lot more competitive.

Gio Tropeano:

Absolutely. Awesome. Well, Michel, I appreciate you jumping on with us. That's all I have for questions. That'll do it. That'll do it for the episode and Dhruba, thank you for your time as well for our listeners and our subscribers. If you want to learn more about Airbyte check them out@airbyte.io, they are hiring aggressively and you can connect with them via slack@slackdotairbyte.io. If you found today's episode helpful, please share the episode and help us get these insights out to your peers. The why, wait real-time analytics podcast is brought to you by Rockset. We here at Rockset that are building a real-time analytics cloud-based platform. Check us out@rockset.com and you can try this real-time analytics software today for two weeks for free with an additional $300 in trial credits. Thanks again for joining and stay tuned for our next episode. Thanks a lot. Bye.

Dhruba Borthakur:

Thank you guys.

Michel Tricot:

Bye. Thank you.

Dhruba Borthakur:

Thanks Michel. Bye


More from Rockset

mouse pointer

See Rockset in action

Real-time analytics at lightning speed