My New Grad Experience at Rockset
June 18, 2021
I first met Rockset at the 2018 Greylock Techfair. Rockset had a unique approach for attracting interest: handing out printed copies of a C program and offering a job to anyone who could figure out what the program was doing.
Though I wasn’t able to solve the code puzzle, I had more luck with the interview process. I joined Rockset after graduating from UCLA in 2019. This is my reflection on the past two years, and hopefully I can shed some light on what it’s like to join Rockset as a new grad software engineer.
I’m a software engineer on the backend team responsible for Rockset’s distributed SQL query engine. Our team handles everything involved in the lifetime of a query: the query compiler and optimizer, the execution framework, and the on-disk data formats of our indexes. I didn’t have much experience with query engines or distributed systems before joining Rockset, so onboarding was quite challenging. However, I have learned a ton during my time here, and I am so fortunate to work with an awesome team on hard technical problems.
Here are some highlights from my time here at Rockset:
1. Learning modern, production-grade C++. I mentioned during my interviews that I was most comfortable with C++. This was based on the fact that I had learned C++ in my introductory computer science courses at college and had also used it in a few other courses. Our team’s codebase is almost all C++, with the exception being Python code that generates more C++ code. To my surprise, I could barely read our codebase when I first joined. std::move()? Curiously recurring template pattern? Just from the language itself, I had a lot to learn.
2. Optimizing distributed aggregations. This is one of the projects I am the most proud of. Last year, we vectorized our query execution framework. Vectorized execution means that each stage of the query processing operates over multiple rows of data at a time. This is in contrast to tuple-based execution, where processing happens over one row of data at a time. Vectorized code is composed of tight loops that take advantage of the CPU and cache, which results in a performance boost. My part in our vectorization effort was to optimize distributed aggregations. This was pretty exciting because it was my first time working on a performance engineering project. I became intimately acquainted with analyzing CPU profiles, and I also had to brush up on my computer architecture and operating systems fundamentals to understand what would help improve performance.
3. Building a backwards compatibility test suite for our query engine. As mentioned in the point above, I have spent time optimizing our distributed aggregations. The key word here is “distributed”. For a single query, computation happens over multiple machines in parallel. During a code deploy, different machines will be running different versions of code. Thus, when making changes to our query engine, we need to make sure that our changes are backwards compatible across different versions of code. While working on distributed aggregations, I introduced a bug that broke backwards compatibility, which caused a large production incident. I felt bad for introducing this production issue, and I wanted to do something so we wouldn’t run into a similar issue in the future. To this effect, I implemented a test framework for validating the backwards compatibility of our query engine code. This test suite has caught several bugs and is a valuable asset for determining the safety of a code change.
4. Debugging core files with GDB. A core file is a snapshot of the memory used by a process at the time when it crashed: the stack traces of all threads in that process, global variables, local variables, the contents of the heap, and so forth. Since the process is no longer running, you cannot execute functions in GDB on the core file. Thus, much of the challenge comes from needing to manually decode complex data structures by reading their source code. This seemed like black magic to me at first. However, after two weeks of wandering around in GDB with a core file, I was able to become somewhat proficient and found the root cause of a production bug. Since then, I have done a lot more debugging with core files because they are absolutely invaluable when it comes to understanding hard to reproduce issues.
5. Serving as primary on-call. The primary on-call is the person who is paged for all alerts in production. This is one of the most stressful things I have ever done, but as a result, it is also one of the best learning opportunities I’ve had. I was on the primary on-call rotation for one year, and during this time, I became much more comfortable with making decisions under pressure. I also strengthened my problem solving skills and learned more about our system as a whole by looking at it from a different perspective. Not to mention, I now knock on wood quite frequently. :)
6. Being part of an amazing team. Working at a small startup can definitely be challenging and stressful, so having teammates that you enjoy spending time with makes it way easier to ride out the tough times. The photo here is taken from Rockset’s annual Tahoe trip. Since joining Rockset, I’ve also gotten much better at games like One Night Werewolf and Among Us.
The last two years have been a period of extensive learning and growth for me. Working in industry is a lot different from being a student, and I personally feel like my onboarding process took over a year and a half. Some things that really helped me grow were diving into different parts of our system to broaden my knowledge, gaining experience by working on incrementally more challenging projects, and finally, trusting the growth process. Rockset is an amazing environment for challenging yourself and growing as an engineer, and I cannot wait to see where the future takes us.