Index Conference 2024

May 16, 2024

Index is a wrap for 2024

Thank you so much for attending and we’ll make the recordings available in the coming weeks.
Get notified when the recordings are ready by filling out the form below.

_speakers;

Venkat Venkataramani

Co-Founder and CEO

Reynold Xin

Cofounder

Shriya Arora

Engineering Manager, Personalization

Julian Jaffe

Staff Software Engineer

Stjepan Buljat

Cofounder and Chief Innovation Officer

Dejan Sijakovic

Chief Architect

Luming Chen

Machine Learning Engineer

Sudeep Das

Head of Machine Learning and AI

Francisco Claude-Faust

Principal Staff Software Engineer

Qian Li

Senior Software Engineer

Ali Dasdan

EVP and CTO

Joel Chou

Senior Principal Architect

Hasmik Sarkezians

Engineering Fellow

Matthijs Douze

Research Scientist, FAIR

Bo Ling

Staff Software Engineer in AI/ML

Jaya Kawale

Vice President of Engineering
(Machine Learning)

Shu Zhang

Director of Engineering

Yexi Jiang

Principal Engineer

Nikhil Garg

CEO Fennel
Ex Head of Platform and Infrastructure Quora

Vishal Kathuria

Technical Director in AI Research

Shriniket Kale

Engineering Manager II for Storage

Adithya Reddy

Senior Software Engineer

Stu Hood

Software Engineer

Gary Lin

Cofounder and CEO

Jun Ding

Senior Solutions Engineer

Caleb Grillo

Product

Hope Wang

Developer Advocate

Sofia Jakovcevic

Product Solutions Engineer

Daniel Svonava

Cofounder and CEO

Het Trivedi

Software Engineer

Hakan Tekgul

ML Solutions Architect

Christina Lin

Director of Developer Advocacy

Nick Lee

Solutions Architect

Ellery Addington-White

Founding Engineer

_agenda;

9:00
am

Keynote: The Future of Search and AI Applications

Speakers: Reynold Xin, Cofounder Databricks; Venkat Venkataramani, CEO Rockset

In the age of AI, new indexing algorithms like FAISS and HNSW enable knowledgeable and contextually-relevant AI applications at scale. Inevitably, similarity indexes cross paths with search indexes to support hybrid search: vector search with metadata filtering. With the number of application indexes dramatically increasing, it is much more complex to tune, get fast results and design for efficiency at scale. Rockset co-Founder and CEO Venkat Venkataramani examines how search and analytics databases are being disrupted by hybrid search.

Venkat is joined by Databricks Cofounder Reynold Xin to discuss the changing model landscape for large-scale search applications. With an influx of new models including DBRX, Gemini and Claude 3, the challenge facing engineering teams is how to design for scale, taking into consideration high-dimensionality, intelligence and serving costs. Reynold shares how companies are iterating through the age of AI while simultaneously running production applications at scale.

9:35
am

Improving Homepage Personalization at Netflix

Speakers: Shriya Arora, Engineering Manager, Personalization; Julian Jaffe, Staff Software Engineer

With 260+ million members in over 190 countries and a billion hours of streaming each month, Netflix is the world’s leading online entertainment company. The highly personalized experience on Netflix, powered by our world-class algorithms, helps subscribers spend less time looking for what to watch and more time enjoying their favorite piece of content. In this talk we will discuss how we developed infrastructure designed to securely process, store and serve member engagement data, which has resulted in the largest online datastore in production at Netflix. We will share the hybrid batch and streaming approach we adopted to provide both low-latency and high accuracy on this data. We will also talk about the challenges we encountered in processing such a high-volume event stream and building a scalable, reliable, and highly available data store.

10:10
am

How Cognism Rearchitected In-App Search

Speakers: Stjepan Buljat, Cofounder and Chief Innovation Officer; Dejan Sijakovic, Chief Architect

Search is key for sales prospecting, helping sales teams uncover the right prospect based on their profile, company demographics, technographic and intent data. In this talk, Stjepan and Dejan discuss how search is integrated throughout Cognism’s sales intelligence platform used by 1800+ sales teams. They share their journey rearchitecting search at Cognism to support frequent updates, joins and multi-tenancy using cloud-native technologies.

10:45
am

How DoorDash Personalizes the Shopping Experience

Speakers: Luming Chen, Machine Learning Engineer; Sudeep Das, Head of Machine Learning and AI

At DoorDash, our mission extends beyond food delivery, embracing a wide range of product verticals including groceries, convenience, and retail. This expansion demands sophisticated machine learning algorithms to ensure seamless consumer experiences at every touchpoint. The DoorDash retail shopping experience mission seeks to combine the best parts of in-person shopping with the power of personalization. By understanding each consumer's purchasing history, dietary restrictions, favorite brands, and other personalized details, we not only can recommend items that reflect a consumer’s unique shopping needs and preferences, but we can also streamline cart-building. In this talk, we show how we built a personalized shopping experience for our new business vertical stores. Following a high-level overview of how we are leveraging AI in solving consumer problems at DoorDash, we hone in on our recommendation systems framework, and various ranking algorithms we have developed. Finally, we describe a few ongoing enhancements to our recommendation system that embraces large language models (LLMs) and other techniques like causal inference.

11:30
am

LinkedIn’s Feed Infrastructure

Speakers: Francisco Claude-Faust, Principal Staff Software Engineer; Qian Li, Senior Software Engineer

We will present the current architecture of the infrastructure powering the main feed of LinkedIn. We will start by covering several layers of the stack at high level, from the lower stateful candidate selection system to the federation layer. We then will zoom into the current candidate selection system and show how (and why) we have evolved our scoring strategy by decoupling it from candidate selection.

12:05
pm

How We Built Search for Go-to-Market Platforms at ZoomInfo

Speakers: Ali Dasdan, EVP and CTO; Joel Chou, Senior Principal Architect; Hasmik Sarkezians, Engineering Fellow

Search platforms powers sales, marketing, and talent search as well as entity resolution, recommendations, and AI at ZoomInfo for 10s of 1000s of B2B customers. Search is also one of the principal ways that our customers interact with our market-leading data, which is large, heterogeneous, and continuously updated; our customers often look to delve deeply and extract highly targeted search results with high frequency and low latency. In this talk, we will cover how we have customized Apache Lucene and Apache Solr to meet these requirements, including real-time joins. We will also present examples of applications using search, together with a list of enhancements we are planning for the future.

12:35
pm

Lunch & Birds of a Feather

Speakers: Gary Lin, Cofounder and CEO at Explo; Jun Ding, Senior Solutions Engineer at Grafana Labs; Caleb Grillo, Product at Warpstream; Hope Wang, Developer Advocate at Alluxio; Sofia Jakovcevic, Product Solutions Engineer at Rockset; Daniel Svonava, Cofounder and CEO at Superlinked; Het Trivedi, Software Engineer at Baseten; Hakan Tekgul, ML Solutions Architect at Arize; Christina Lin; Director of Developer Advocacy at Redpanda; Nick Lee, Solutions Architect at Tecton

Lunch & Birds of a Feather Birds of a feather are informal lunchtime discussions focused on specific topics, providing engineers a collaborative, open environment to contribute their knowledge, ask questions and share best practices. The term comes from the proverb "birds of a feather flock together," suggesting that engineers with similar interests or challenges will congregate in groups. At Index, we’re facilitating birds of a feather discussions for in-person attendees on the following topics:

Building customer-facing analytics with Gary Lin, Explo
Best practices for performance testing with Jun Ding, Grafana Labs
Serverless architectures for streaming data with Caleb Grillo, WarpStream and Christina Lin, Redpanda
Best data management practices for AI with Hope Wang, Alluxio
Architecting hybrid search with Sofia Jakovcevic, Rockset
Deploying and optimizing ML models for real-time inference with Het Trivedi, Baseten
Taking vectors from POC to prod with Daniel Svonava, Superlinked
Observability and evaluation of LLMs with Hakan Tekgul, Arize
Architecting the stack for ML feature engineering with Nick Lee, Tecton
Async architectures with Ellery Addington-White, Momento

1:35
pm

Vector Search and the FAISS Library

Speaker: Matthijs Douze, Research Scientist, FAIR

FAISS is a library for approximate nearest neighbor search (ANN), providing indexing methods that are used to search, cluster, compress and transform vector embeddings at scale. Over the years, it has become one of the most popular vector search libraries, that powers several production database engines and inspired many others. FAISS supports trillion-scale indexing and is used for semantic search, recommendation and knowledge base assistant applications and more. In this talk, Matthijs Douze will discuss the tradeoff space of vector search and how different FAISS index implementations strike different operating points in this space. FAISS is designed to scale from a quick tool called in a Python notebook to the core engine of a production-grade database engine. This scalability is enabled by a clear separation of concerns and an open API design.

2:10
pm

How Uber Eats Built a Recommendation System Using Two Tower Embeddings

Speaker: Bo Ling, Staff Software Engineer in AI/ML

Uber created two tower embeddings to power recommendations for its Uber Eats platform across hundreds of millions of stores in under 100s of milliseconds. The two tower embedding required infrastructure to support approximate nearest neighbor indexing and spatial indexing, as users order from stores closest to their location. In this talk, Bo discusses the infrastructure required to support both real-time and batch retrieval and rankings. They also share how data preparation and serving was enhanced through a tighter integration between Michelangelo's Feature store and Uber’s search platform.

2:55
pm

New Architectural Patterns in Recommendation Systems

Speakers: Jaya Kawale, VP of Engineering Tubi; Shu Zhang, Director of Engineering Pinterest; Yexi Jiang, Principal Engineer Roblox; Nikhil Garg, CEO Fennel; Vishal Kathuria, former Technical Director in AI Research at Meta;

Recommendation systems help users discover products, foster engagement and grow their connections on platforms. This panel discusses trends in recommendation systems around making the entire stack real time, incorporating LLMs in both retrieval and ranking, and scaling efficiently with disaggregated architectures. Hear from the following engineering leaders who have scaled recommendation systems:

Jaya Kawale, Vice President of Engineering Machine Learning at Tubi, moved Tubi’s personalization engine from batch to real time, helping millions of viewers find relevant, binge-worthy content
Shu Zhang, Director of Engineering at Pinterest, built a large-scale machine learning system that powers personalized ad recommendations for hundreds of millions of users with millisecond latency
Yexi Jiang, Principal Engineer at Roblox, built the recommendation system that powers the homepage, ads, social, avatar and more along with tooling to expedite the iteration process
Nikhil Garg, Cofounder and CEO, builds real-time machine learning at Fennel.ai applying best practices learned scaling ranking and recommendation systems at Meta and Quora
Vishal Kathuria, former Technical Director in AI Research at Meta, led the work on aligning Meta's recommendation and ranking system with core human values of connection, empathy, care, inspiration and information using deep learning models.

3:30
pm

RocksDB Meetup: Differential Backups on MyRocks at Uber

Speaker: Shriniket Kale, Engineering Manager II for Storage; Adithya Reddy, Senior Software Engineer

In this talk, we will go briefly over how MyRocks (RocksDB storage engine in MySQL) is used in Uber's Docstore DB storing tens of Petabytes of data. And then talk about why and how we do efficient differential backups with MyRocks at Petabyte scale and keep the cost low.

4:05
pm

RocksDB Meetup: Enabling Distributed Transactions Across Microservices with RocksDB

Speakers: Stu Hood, Software Engineer;

With the rise of cloud-native development, organizations have been faced with a difficult decision: attempt to scale their applications as monoliths (and cope with centralized, finicky, rollback-prone deploys) or break their applications into microservices (and suffer from data/ownership siloes that make it difficult to transactionally manage state). They have been forced to make uncomfortable tradeoffs. We believe this is a false choice, we're just missing the right abstractions. We introduce Resemble, a full-stack, “transactional microservices” framework that allows you to write code with the simplicity of a monolith (straightforward function calls composed via transactions), but with the scalability of microservices (encapsulation of data and ownership, support for polyglot components, independent deploys, etc). In this talk we'll explain our programming model and how RocksDB enabled our implementation. Resemble acts as a middleware that coordinates reads, writes, and transactions across multiple independent instances of RocksDB (we take advantage of RocksDB's support for transactions to simplify our implementation). Resemble also provides a layer over RocksDB that enables reactive primitives, including first class support for front-end applications written using React to call functions in the backend that are re-executed automatically as data changes ("React for the backend").

_recordings;

Interested in speaking at Index 2025?

Submit a Talk

_startup_expo;

Community of startups contributing advancements to streaming data stacks, ML infrastructure, real-time metrics and visualizations. Meet the startups at Index.

May 16, 2024

Thank you so much for attending and we’ll make the recordings available in the coming weeks. Get notified when the recordings are ready by filling out the form below.