Sign up to receive blog updates in your inbox.

Rockset adds Excel spreadsheet support: Use SQL across XLSX files and join with other JSON, CSV or Parquet data

An incredible amount of business data is floating around in Excel spreadsheets - so data scientists often need to analyze data across multiple worksheets or even multiple spreadsheets using SQL. Additionally, this data may need to be joined with other data sets that are in JSON, CSV or Parquet formats.

Microsoft Excel currently has some basic SQL support in place:

  1. Use SQL for connecting to an external database like Access or SQL Server, parsing field or table contents and importing the data.
  2. Use SQL for reading a worksheet (SELECT * FROM [Sheet1$]) or reading a range (SELECT * FROM MyRange).

However it does not support complex SQL analysis across multiple spreadsheets and other data sets.

Using Rockset to analyze Excel spreadsheets

Rockset's core superpower is the ability to ingest different data formats like JSON, CSV or Parquet from different sources like local desktops, data lakes, streaming sources and online databases - and immediately power fast SQL across all these data sets. We recently added support for Excel spreadsheets (see documentation), which means you can now ingest XLSX files into a Rockset collection and instantly query across them using full-featured SQL with millisecond latency.


Start by creating a new collection, say MyCollection, in Rockset and ingesting your Excel spreadsheets. Your XLSX files can be uploaded from your local host using Rockset's file uploader, or bulk ingested from a data lake like AWS S3. Rockset will automatically parse and index the contents of the spreadsheet so that it is ready to query. We take advantage of Rockset's strong dynamic typing in SQL to achieve this.



Start by using the DESCRIBE command to list the available fields in your collection. Each row in your spreadsheet will correspond to a document in Rockset. You might want to query the first several rows to see the shape of the data:

FROM MyCollection
ORDER BY rownum

If you have other Rockset collections with other spreadsheets or nested JSON, CSV, Parquet data you can now run standard SQL to join and analyze your data sets. We often see examples of interesting data science on nested JSON.


Once you are done with your analysis you can use Rockset as the serving layer for an app or a live dashboard using a visualization tool of your choice.

As an example, here is an interesting analysis of trends in college financial aid using SQL across XLSX and CSV files.

Screen Shot 2019-01-18 at 3.11.44 PM

Real-time SQL on NoSQL. Start today.

Related Posts

Joining Data in DynamoDB and S3 for Live, Ad-Hoc Analysis

Using SQL to join DynamoDB and S3 data, operations teams can perform live, ad-hoc analysis across multiple cloud systems.

Methods for Running SQL on JSON in PostgreSQL, MySQL and Other Relational Databases

We examine various options for running SQL on JSON in relational databases, like PostgreSQL and MySQL, and in Rockset.

Redshift with Rockset: High performance queries for operational analytics

Run high performance queries for operational analytics on data from Redshift tables by continuously ingesting and indexing Redshift data through a Rockset-Redshift integration.