This guide helps you get up to speed with writing queries. Rockset has fully-featured SQL support, including aggregations, filtering, windowing, and joins. See the SQL Reference for a comprehensive list. If you've used a SQL database like Postgres before, you should feel at home writing queries with Rockset.
Rockset has several distinct features that we cover in this guide.
- Rockset does not require your data to have a pre-built schema. We instead infer the schema at read time.
- Rockset has syntax to make it much easier to query nested data.
- There are also some extra tips and answers to common questions at the end.
Fixed schemas are not required
Rockset ingests your data without the need for a pre-built schema. Whereas every row in a MySQL or Postgres table has the same set of fields with a predefined type for each field, you can think of each Rockset document as a JSON dictionary with its own fields and values. In this sense, Rockset is dynamically typed, while traditional databases are statically typed.
Individual fields in documents are strongly typed: each value for a given field in a given document
has one specific type and values of incompatible types don't automatically convert to each other.
This means that you still have the same SQL query semantics as in traditional databases. If you
were to query for all documents where field
x equals the number
10, for example, documents
without that field or documents where
x was equal to the string
'10' would not be returned.
(One other note with strong typing: queries would also throw a runtime error if you tried to evaluate
an expression by passing in arguments with incompatible types: adding a datetime and a string, for example.)
More about DESCRIBE
DESCRIBE "collection" allows you to see what types exist in a Rockset collection or field. The
difference from a traditional database is that we infer the schema at read time by looking over the
contents of your Rockset collection to derive the current shape of your data.
The output of
DESCRIBE has the following fields:
- field: Every distinct field name in the collection.
- type: The data type of the field.
- occurrences: The number of documents that have this field in the given type.
- total: Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields.
DESCRIBE documentation for examples of
DESCRIBE for different data distributions.
Querying Nested Data
Rockset makes it much easier than usual to query highly-nested data:
- To query arrays, use square bracket notation (
). For example,
SELECT aselects the first entry in the array field
- To query objects, use dot notation (
.). For example,
SELECT a.bselects the value of the key
bin the object field
In any nested field path, Rockset assumes the first token is the collection name, and will
return an error if missing. So for example, to access field
a.b in collection
refer to the field as
commons.foo.a.b in the query.
One other special command that's very useful when working with arrays is
each entry in an array value as a separate row. Or, more formally, UNNEST is a row-generating
function that takes an array as input and outputs one row per element of the input array. It's different
than other functions in that it outputs multiple rows (a relation) rather than one value (a scalar). That means
it's used in queries in the same places a collection appears: a
Refer to our
UNNEST documentation for more details.
UNNEST(array ['Alice', 'Bob'] AS name)
+---------+ | name | |---------| | Alice | | Bob | +---------+
Remembering which quotes to use is always confusing if you don't write SQL regularly. Single quotes are used for strings, and double quotes are used for field and collection names. A useful mnemonic is: "Single quotes are for Strings, Double quotes are for Database identifiers"
If you're writing a complex query, the
WITHstatement can be very useful in breaking it down into something more readable and modular.
WITH recent_users AS (
critical_app_events AS (
LEFT JOIN critical_app_events...
When certain characters, such as
-, are part of collection org field names the entire name must always be wrapped in double quotes. For example,
SELECT id, age, "full-name" from "people-list"
Rockset is strongly typed, and will evaluate to false if you attempt to compare values of different types (the way the string '10' would not equal the integer 10 in many programming languages). Depending on your goal, you could write a query that casts everything to the same type:
SELECT field FROM c where cast(field as int) = 10, or one that only fetches rows that match a specific type
SELECT field FROM c WHERE field = '10'
You can inspect what types exist in a collection or field in the collection's console page or with a call to
As mentioned above, in any nested field path, Rockset assumes the first token is the collection name. To access field
commons.foo, refer to the field as
commons.foo.a.bin the query.
SQL has two different syntaxes for joining tables: an implicit syntax that just uses the FROM clause
FROM table, table2, and an explicit syntax of the form
FROM table JOIN table2 ON .... Either works, but try not to mix them both in the same query (it will work, but error messages will be confusing.)
There is a full list of SQL commands and functions in the SQL Reference