SQL Guide

This guide helps you get up to speed with writing queries. Rockset has fully-featured SQL support, including aggregations, filtering, windowing, and joins. See the SQL Reference for a comprehensive list. If you've used a SQL database like Postgres before, you should feel at home writing queries with Rockset.

Rockset has several distinct features that we cover in this guide:

Fixed schemas are not required

Rockset ingests your data without the need for a pre-built schema. Whereas every row in a MySQL or Postgres table has the same set of fields with a predefined type for each field, you can think of each Rockset document as a JSON dictionary with its own fields and values. In this sense, Rockset is dynamically typed, while traditional databases are statically typed.

Individual fields in documents are strongly typed: each value for a given field in a given document has one specific type and values of incompatible types don't automatically convert to each other. This means that you still have the same SQL query semantics as in traditional databases. If you were to query for all documents where field x equals the number 10, for example, documents without that field or documents where x was equal to the string '10' would not be returned. (One other note with strong typing: queries would also throw a runtime error if you tried to evaluate an expression by passing in arguments with incompatible types: adding a datetime and a string, for example.)

More about DESCRIBE

DESCRIBE "collection" allows you to see what types exist in a Rockset Collection or field. The difference from a traditional database is that we infer the schema at read time by looking over the contents of your Rockset collection to derive the current shape of your data.

The output of DESCRIBE has the following fields:

  • field: Every distinct field name in the collection.
  • type: The data type of the field.
  • occurrences: The number of documents that have this field in the given type.
  • total: Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields.

See the DESCRIBE documentation for examples of DESCRIBE for different data distributions.

Querying Nested Data

Rockset makes it much easier than usual to query highly-nested data:

  • To query arrays, use square bracket notation ([]). For example, SELECT a[1] selects the first entry in the array field a.
  • To query objects, use dot notation (.). For example, SELECT a.b selects the value of the key b in the object field a.

In any nested field path, Rockset assumes the first token is the collection name, and will return an error if missing. So for example, to access field a.b in collection commons.foo, refer to the field as commons.foo.a.b in the query.

One other special command that's very useful when working with arrays is UNNEST. UNNEST treats each entry in an array value as a separate row. Or, more formally, UNNEST is a row-generating function that takes an array as input and outputs one row per element of the input array. It's different than other functions in that it outputs multiple rows (a relation) rather than one value (a scalar). That means it's used in queries in the same places a collection appears: a FROM or JOIN clause.

Refer to our UNNEST documentation for more details.

Example:

SELECT * FROM UNNEST(array ['Alice', 'Bob'] AS name)
+---------+ | name | |---------| | Alice | | Bob | +---------+

More Advice

  • Remembering which quotes to use is always confusing if you don't write SQL regularly. Single quotes are used for strings, and double quotes are used for field and collection names. A useful mnemonic is: "Single quotes are for Strings, Double quotes are for Database identifiers"

  • If you're writing a complex query, the WITH statement can be very useful in breaking it down into something more readable and modular.

    WITH recent_users AS (
        SELECT ... from giant_user_database ...
    ),
    critical_app_events AS (
        SELECT ... from event_stream ...
    )
    SELECT
        ...
    FROM recent_users
    LEFT JOIN critical_app_events ...
    WHERE ...

  • When certain characters, such as -, are part of collection org field names the entire name must always be wrapped in double quotes. For example, SELECT id, age, "full-name" from "people-list"

  • Rockset is strongly typed, and will evaluate to false if you attempt to compare values of different types (the way the string '10' would not equal the integer 10 in many programming languages). Depending on your goal, you could write a query that casts everything to the same type: SELECT field FROM c where cast(field as int) = 10, or one that only fetches rows that match a specific type SELECT field FROM c WHERE field = '10'

You can inspect what types exist in a collection or field in the collection's console page or with a call to DESCRIBE "collection".

  • As mentioned above, in any nested field path, Rockset assumes the first token is the collection name. To access field a.b in collection commons.foo, refer to the field as commons.foo.a.b in the query.

  • SQL has two different syntaxes for joining tables: an implicit syntax that just uses the FROM clause FROM table, table2, and an explicit syntax of the form FROM table JOIN table2 ON .... Either works, but try not to mix them both in the same query (it will work, but error messages will be confusing.)

  • There is a full list of SQL commands and functions in the SQL Reference