Sign up to receive blog updates in your inbox.

Rockset adds Excel spreadsheet support: Use SQL across XLSX files and join with other JSON, CSV or Parquet data

An incredible amount of business data is floating around in Excel spreadsheets - so data scientists often need to analyze data across multiple worksheets or even multiple spreadsheets using SQL. Additionally, this data may need to be joined with other data sets that are in JSON, CSV or Parquet formats.

Microsoft Excel currently has some basic SQL support in place:

  1. Use SQL for connecting to an external database like Access or SQL Server, parsing field or table contents and importing the data.
  2. Use SQL for reading a worksheet (SELECT * FROM [Sheet1$]) or reading a range (SELECT * FROM MyRange).

However it does not support complex SQL analysis across multiple spreadsheets and other data sets.

Using Rockset to analyze Excel spreadsheets

Rockset's core superpower is the ability to ingest different data formats like JSON, CSV or Parquet from different sources like local desktops, data lakes, streaming sources and online databases - and immediately power fast SQL across all these data sets. We recently added support for Excel spreadsheets (see documentation), which means you can now ingest XLSX files into a Rockset collection and instantly query across them using full-featured SQL with millisecond latency.

Ingest

Start by creating a new collection, say MyCollection, in Rockset and ingesting your Excel spreadsheets. Your XLSX files can be uploaded from your local host using Rockset's file uploader, or bulk ingested from a data lake like AWS S3. Rockset will automatically parse and index the contents of the spreadsheet so that it is ready to query. We take advantage of Rockset's strong dynamic typing in SQL to achieve this.

console-create-xls2

Query

Start by using the DESCRIBE command to list the available fields in your collection. Each row in your spreadsheet will correspond to a document in Rockset. You might want to query the first several rows to see the shape of the data:

SELECT * 
FROM MyCollection
ORDER BY rownum
LIMIT 10

If you have other Rockset collections with other spreadsheets or nested JSON, CSV, Parquet data you can now run standard SQL to join and analyze your data sets. We often see examples of interesting data science on nested JSON.

Build

Once you are done with your analysis you can use Rockset as the serving layer for an app or a live dashboard using a visualization tool of your choice.

As an example, here is an interesting analysis of trends in college financial aid using SQL across XLSX and CSV files.

Screen Shot 2019-01-18 at 3.11.44 PM

Real-time SQL on NoSQL. Start today.

Related Posts

Index Scan: Using Rockset's Search Index to Speed up Range Scans Over a Specific Field

Rockset uses Converged Indexing to make different types of queries run fast. We look at how Rockset's Index Scan uses the search index to accelerate range scans.

Can I Do SQL-Style Joins in Elasticsearch?

While joins are primarily an SQL concept, they are equally important in the NoSQL world as well. We explore how to perform the equivalent of SQL joins when using Elasticsearch.

Joining Data in DynamoDB and S3 for Live, Ad-Hoc Analysis

Using SQL to join DynamoDB and S3 data, operations teams can perform live, ad-hoc analysis across multiple cloud systems.