Rockset

    Commands

    This page documents all available SQL commands in Rockset.

    In the command templates shown here, all uppercase words are keywords and lowercase words are user-provided values. In practice, all keywords and function names are case insensitive; for example, describe "_events" is equivalent to DESCRIBE "_events".

    DESCRIBE

    DESCRIBE collection_name

    The output of DESCRIBE has the following fields:

    • field: Every distinct field name in the collection
    • type: The data type of the field
    • occurrences: The number of documents that have this field in the given type
    • total: Total number of documents in the collection for top level fields, and total number of documents that have the parent field for nested fields

    Note: We use the delimiter * (glob) in the field name to indicate descending into an array. It can be replaced by an integer index to construct an instance of the field path. If multiple data types appear for a certain field path, DESCRIBE will return one entry for each type.

    Suppose a collection names has the following documents:

    {
        "name": "John",
        "age": 31,
        "city": "New York"
    },
    {
        "name": "Michael",
        "age": "notfound",
        "experiences": [
            {
                "title": "XYZ",
                "years": 4
            }
        ]
    }
    DESCRIBE names
    +--------------------------------------+---------------+---------+-----------+
    | field                                | occurrences   | total   | type      |
    |--------------------------------------+---------------+---------+-----------|
    | ['name']                             | 2             | 2       | string    |
    | ['age']                              | 1             | 2       | string    |
    | ['age']                              | 1             | 2       | int       |
    | ['city']                             | 1             | 2       | string    |
    | ['experiences']                      | 4             | 4       | string    |
    | ['experiences', '*']                 | 1             | 1       | object    |
    | ['experiences', '*', 'title']        | 1             | 1       | string    |
    | ['experiences', '*', 'years']        | 1             | 1       | int       |
    +--------------------------------------+---------------+---------+-----------+

    Let's look at how DESCRIBE works with collections that have documents with each of the following properties:

    • Mixed Types
    • Nested Objects
    • Sparse Fields and Null Values
    • Nested Arrays

    Mixed Types

    Consider the collection zipcodes with the following documents:

    {"zipcode": 94542},
    {"zipcode": "91126"},
    {"zipcode": 94110.0},
    {"zipcode": "94020"}

    Every document in this collection has the same field called zipcode, but its type is different across the collection. This is captured below.

    DESCRIBE zipcodes
    +-------------+-------------+-------+--------+
    | field       | occurrences | total | type   |
    |-------------+-------------+-------+--------|
    | ['zipcode'] | 1           | 4     | int    |
    | ['zipcode'] | 2           | 4     | string |
    | ['zipcode'] | 1           | 4     | float  |
    +-------------+-------------+-------+--------+

    We can use the typeof SQL function to get all zipcodes of type string:

    SELECT
        zipcode
    FROM
        zipcodes
    WHERE
        TYPEOF(zipcode) = 'string'
    +-----------+
    | zipcode   |
    |-----------|
    | 94020     |
    | 91126     |
    +-----------+

    Then to cast field zipcode as string as we retrieve it, we can do the following:

    SELECT
        CAST(zipcode AS string) zipcode
    FROM
        zipcodes
    +-----------+
    | zipcode   |
    |-----------|
    | 94020     |
    | 94110     |
    | 94542     |
    | 91126     |
    +-----------+

    Nested Objects

    Documents can have objects with scalars, nested arrays, or nested objects. Consider a simple collection of documents with names of Turing Award winners, which we will call turing_award_winners.

    {"name": {"first": "John", "last": "HopCroft"}},
    {"name": {"first": "Robert", "last": "Tarjan"}},
    {"name": {"first": "Alan", "last": "Kay"}},
    {"name": {"first":"Edsger", "last": "Dijkstra"}}

    Each document has an object with a field called name that has nested fields first and last. To access nested fields inside objects, we concatenate the field names with a . (dot) as separator. For example, use name.first and name.last to access the first and last names, respectively, in each of these documents. The fields name.first and name.last are present in all documents as scalars of type 'string'. Hence, the occurrences and total for each field in this document will be the same as the total number of documents.

    DESCRIBE turing_award_winners
    +--------------------------------------+---------------+---------+-----------+
    | field                                | occurrences   | total   | type      |
    |--------------------------------------+---------------+---------+-----------|
    | ['name']                             | 4             | 4       | object    |
    | ['name', 'first']                    | 4             | 4       | string    |
    | ['name', 'last']                     | 4             | 4       | string    |
    +--------------------------------------+---------------+---------+-----------+

    Say we want to list all the first and last names from this collection.

    SELECT
        turing_award_winners.name.first,
        turing_award_winners.name.last
    FROM
        turing_award_winners
    +---------+----------+
    | first   | last     |
    |---------+----------|
    | Alan    | Kay      |
    | John    | HopCroft |
    | Robert  | Tarjan   |
    | Edsger  | Dijkstra |
    +---------+----------+

    Sparse Fields and Null Values

    Suppose collection turing_award_winners now has the following documents.

    {"name": {"first": "John", "last": "HopCroft"}},
    {"name": {"first": "Robert", "last": "Tarjan"}},
    {"name": {"first": "Alan", "middle": "Curtis", "last": "Kay"}},
    {"name": {"first": "Edsger", "last": "Dijkstra"}}

    Notice that this is similar to the previous collection with an additional nested field middle added to the third document. The field name.middle is a sparse field which is a string in 1 document. It is treated as undefined in the other 3 documents that it is absent in. Note that this is different from null. This is clarified in further detail in the data-types page. DESCRIBE helps capture such sparse fields as well.

    DESCRIBE turing_award_winners
    +--------------------------------------+---------------+---------+-----------+
    | field                                | occurrences   | total   | type      |
    |--------------------------------------+---------------+---------+-----------|
    | ['name']                             | 4             | 4       | object    |
    | ['name', 'first']                    | 4             | 4       | string    |
    | ['name', 'last']                     | 4             | 4       | string    |
    | ['name', 'middle']                   | 1             | 4       | string    |
    +--------------------------------------+---------------+---------+-----------+

    While fields with type undefined are not captured in the smart schema, those with type null are. Say the last document in the collection was {"name": {"first": "Edsger", "middle": null, "last": "Dijkstra"}} instead.

    DESCRIBE turing_award_winners
    +-------------------+----------------+---------+-----------+
    | field              | occurrences   | total   | type      |
    |------------------------------------+---------------------+
    | ['name']           | 4             | 4       | object    |
    | ['name', 'first']  | 4             | 4       | string    |
    | ['name', 'last']   | 4             | 4       | string    |
    | ['name', 'middle'] | 1             | 4       | null      |
    | ['name', 'middle'] | 1             | 4       | string    |
    +----------------------------------------------+-----------+

    Both null and undefined types behave the same way in almost all situations. The predicate IS NULL will return true for both null and undefined. You can filter the fields that are undefined with a special predicate IS UNDEFINED.

    SELECT
        COUNT(*)
    FROM
        turing_award_winners
    WHERE
        turing_award_winners.name.middle IS NOT NULL
    +----------+
    | ?count   |
    |----------|
    | 1        |
    +----------+
    SELECT
        turing_award_winners.name.middle
    FROM
        turing_award_winners
    WHERE
        turing_award_winners.name.middle IS NOT NULL
    +----------------------+
    | middle               |
    |----------------------|
    | Curtis               |
    +----------------------+

    If you want to also include the fields with a null value, you can use the IS NOT UNDEFINED predicate:

    SELECT
        turing_award_winners.name.middle
    FROM
        turing_award_winners
    WHERE
        turing_award_winners.name.middle IS NOT UNDEFINED
    +----------------------+
    | middle               |
    |----------------------|
    | Curtis               |
    | null                 |
    +----------------------+

    Nested Arrays

    Just like nested objects, documents can also have arrays comprising of nested fields that are scalars, objects, or arrays. Consider the following collection of documents with Turing Award winners by year. Let's call this collection turing_award_winners_by_year.

    {
        "year" : "1972",
        "winners" : [
                        {
                            "first" : "Edsger",
                            "last" : "Dijkstra",
                            "subjects" : [ "Program Verification", "Programming" ]
                        }
                    ]
    },
    {
        "year" : "1986",
        "winners" : [
                        {
                            "first" : "John",
                            "last" : "HopCroft",
                            "subjects" : [ "Analysis of Algorithms", "Data Structures" ]
                        },
                        {
                            "first" : "Robert",
                            "last" : "Tarjan",
                            "subjects" : [ "Analysis of Algorithms", "Data Structures" ]
                        }
                   ]
    },
    {
        "year" : "2003",
        "winners" : [
                        {
                            "first" : "Alan",
                            "last" : "Kay",
                            "subjects" : [ "Personal Computing", "Programming Languages" ]
                        }
                    ]
    }

    This shape of this collection is succinctly captured in the smart schema below.

    DESCRIBE turing_award_winners_by_year
    +--------------------------------------------+---------------+---------+-----------+
    | field                                      | occurrences   | total   | type      |
    |--------------------------------------------+---------------+---------+-----------|
    | ['winners']                                | 3             | 3       | array     |
    | ['winners', '*']                           | 4             | 4       | object    |
    | ['winners', '*', 'first']                  | 4             | 4       | string    |
    | ['winners', '*', 'last']                   | 4             | 4       | string    |
    | ['winners', '*', 'subjects']               | 4             | 4       | array     |
    | ['winners', '*', 'subjects', '*']          | 8             | 8       | string    |
    | ['year']                                   | 3             | 3       | string    |
    +--------------------------------------------+---------------+---------+-----------+

    The occurrence count for each of the nested elements is equal to the number of array elements of the given type, summed across all documents in the collection.

    The total count for each such entry is equal to the total number of array elements at that level of nesting, summed across all documents in the collection. Note that we use the delimiter '*' (glob) in the field name to indicate descending into an array. Further, Rockset treats arrays as virtual collections and allows you to DESCRIBE arrays as well. When using a nested array as a target collection in queries we use the delimiter : (colon) as a separator between the root collection and the nested fields.

    DESCRIBE turing_award_winners_by_year :winners [*]
    +------------------------+---------------+---------+--------+
    | field                  | occurrences   | total   | type   |
    |------------------------+---------------+---------+--------|
    | ['*']                  | 4             | 4       | object |
    | ['*', 'first']         | 4             | 4       | string |
    | ['*', 'last']          | 4             | 4       | string |
    | ['*', 'subjects']      | 4             | 4       | array  |
    | ['*', 'subjects', '*'] | 8             | 8       | string |
    +------------------------+---------------+---------+--------+
    DESCRIBE turing_award_winners_by_year :winners [*].subjects
    +---------+---------------+---------+--------+
    | field   | occurrences   | total   | type   |
    |---------+---------------+---------+--------|
    | ['*']   | 8             | 8       | string |
    +---------+---------------+---------+--------+

    SELECT

    [ WITH with_query [, ...] ]
    SELECT [ ALL | DISTINCT ] select_expr [, ...]
    [ FROM from_item [, ...] ]
    [ WHERE condition ]
    [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
    [ HAVING condition]
    [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_query ]
    [ ORDER BY expression [ ASC | DESC ] [, ...] ]
    [ LIMIT [ count | ALL ] ]
    [ OFFSET count ]

    Returns result set from one or more collections. May include various other clauses:

    • with_query is of the form alias_name AS select_query.
    • select_expr is a valid SQL expression.
    • from_item is one of:
      • collection_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ] [ TABLESAMPLE BERNOULLI (percentage) ]
        • percentage is the percentage of input to be sampled (ranges from 0 to 100)
      • from_item join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ] where join_type is one of:
        • [ INNER ] JOIN
        • LEFT [ OUTER ] JOIN
        • RIGHT [ OUTER ] JOIN
        • CROSS JOIN (which should not be followed by ON/USING, all other should)
    • condition is a SQL expression that evaluates to a boolean. See the section on conditional operators.
    • grouping_element is one of:
      • ()
      • expression
      • GROUPING SETS ( ( column [, ...] ) [, ...] )
      • CUBE ( column [, ...] )
      • ROLLUP ( column [, ...] )
    • select_query is itself a SELECT command with the same recursive format.
    • expression is a valid SQL expression.
    • count is the number of results to be returned.

    Below we cover common uses some clauses in more detail.

    To return all available fields, instead of listing them out as select expressions, specify a single select expression *.

    SELECT
        *
    FROM
        _events
    +--------------------------------------+---------------+--------+---------------+------+
    | _id                                  | eventTime     | kind   | label         | type |
    +--------------------------------------+---------------+--------+---------------+------+
    | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | 1536164080248 | QUERY  | QUERY_SUCCESS | INFO |
    | 4a5e941e-7683-4db2-b8e0-ee0e4807d6a7 | 1536164079995 | QUERY  | QUERY_SUCCESS | INFO |
    | 6a9c5d49-ee3b-4b78-b653-006c540f70c3 | 1536162113586 | QUERY  | QUERY_ERROR   | INFO |
    +--------------------------------------+---------------+--------+---------------+------+

    To return only unique results, include DISTINCT after SELECT.

    SELECT
        DISTINCT label
    FROM
        _events
    +---------------+
    | label         |
    +---------------+
    | QUERY_SUCCESS |
    | QUERY_ERROR   |
    +---------------+

    WITH Clause

    The WITH clause can be used to chain queries by defining a named subquery that can be then used within the main query.

    Using WITH clauses can improve the readability and maintainability of complex queries.

    WITH x AS (
        SELECT
            _id,
            label
        FROM
            _events
    )
    SELECT
        _id,
        label
    FROM
        x
    +--------------------------------------+---------------+
    | _id                                  | label         |
    |--------------------------------------+---------------|
    | 23658547-1ae5-4be8-97e5-6e043f682c31 | QUERY_SUCCESS |
    | 85e16122-d749-444d-ab63-bb5a0544368b | QUERY_SUCCESS |
    | 972485f2-e3b0-43c4-bb7e-c70b7df3e86c | QUERY_SUCCESS |
    +--------------------------------------+---------------+

    There can be multiple subqueries in a WITH clause, and one can use any of the previous ones.

    WITH x AS (
        SELECT
            _id,
            label,
            kind
        FROM
            _events
    ),
    y AS (
        SELECT
            _id,
            kind AS c
        FROM
            x
    )
    SELECT
        _id,
        c
    FROM
        y
    +--------------------------------------+-------+
    | _id                                  | c     |
    |--------------------------------------+-------|
    | 589eaee9-0c62-46f9-8f7a-0a5a3fcf2e2c | QUERY |
    | 3e2e99b9-896f-4852-9669-ab5d1a1a6ebf | QUERY |
    | a2b5d818-cad6-49b8-8f9d-bcc77146e060 | QUERY |
    +--------------------------------------+-------+

    FROM Clause

    When querying a single collection, the FROM clause consists simply of the collection name. However, the FROM clause can also be used to reshape the data that is being queried.

    UNNEST

    UNNEST is a function that can be used to expand arrays into several rows.

    UNNEST is a row generating function which takes as input an array and outputs one row per element of the input array. Unlike other functions, it outputs multiple rows (a relation) rather than one value (a scalar). That means it appears in queries in the same places a collection appears: a FROM or JOIN clause.

    This query takes an array of 3 numbers, and produces 3 rows. An UNNEST can have two aliases - an outer alias which names the relation, and an inner alias which names the field within the relation.

    SELECT
        numbers.aNumber
    FROM
        UNNEST(array [1, 2, 3] AS aNumber) AS numbers
    +---------+
    | aNumber |
    |---------|
    | 1       |
    | 2       |
    | 3       |
    +---------+

    We can also UNNEST arrays of objects. We can access fields of each object with the usual dot notation.

    SELECT
        myFruits.someFruit.name,
        myFruits.someFruit.color
    FROM
        UNNEST(
            JSON_PARSE(
                '[{"name": "strawberry", "color": "red"}, {"name": "lime", "color": "green"}]'
            ) AS someFruit
        ) AS myFruits
    +------------+---------+
    | name       | color   |
    |------------+---------|
    | strawberry | red     |
    | lime       | green   |
    +------------+---------+

    UNNEST is particularly useful when we have an array field within a collection. Suppose collection companies contains the following documents:

    {
        "make": "Ford",
        "models": ["Focus", "Mustang", "F-150"],
        "offices": [
            {"city": "Dearborn", "state": "MI"},
            {"city": "Denver", "state": "CO"}
        ]
    },
    {
        "make": "Toyota",
        "models": ["Prius", "Highlander", "Camry"],
        "offices": [
            {"city": "Jacksonville", "state": "FL"},
            {"city": "Ann Arbor", "state": "MI"}
        ]
    }

    For each company, we want to generate one row for each model of car. This will let us manipulate the models more easily, for instance by aggregating or filtering them. We do this by joining the companies collection with the output of UNNEST. For each company, UNNEST generates one row per model. We then perform a correlated cross join. It is correlated because the input to the right side of the join depends on the current row being processed on the left side. It is a cross join because we take every output from the right side and join it with the current row from the left. There is no join condition.

    SELECT
        companies.make,
        models.*
    FROM
        companies
        CROSS JOIN UNNEST(companies.models AS model) AS models
    +---------+------------+
    | make    | model      |
    |---------+------------|
    | Ford    | Focus      |
    | Ford    | Mustang    |
    | Ford    | F-150      |
    | Toyota  | Prius      |
    | Toyota  | Highlander |
    | Toyota  | Camry      |
    +---------+------------+

    In SQL, we can cross join two relations implicitly by just listing them with a comma in between them, so the following query is equivalent to the above:

    SELECT
        companies.make,
        models.*
    FROM
        companies,
        UNNEST(companies.models AS model) AS models

    Note: There is an important caveat with this shorthand though. Mixing implicit joins (i.e. SELECT * FROM t1, t2 WHERE t1.x = t2.x) with explicit joins (i.e. SELECT * FROM t1 JOIN t2 WHERE t1.x = t2.x) can cause confusing results. If you want to unnest a field from a collection and join with another collection, perform all joins explicitly:

    SELECT
        companies.*,
        models.model,
        makeInfo.*
    FROM
        companies
        CROSS JOIN UNNEST(companies.models AS model) AS models
        JOIN makeInfo ON companies.make = makeInfo.make

    The index within the original array can also be extracted by appending WITH ORDINALITY AS alias_name inside the UNNEST function.

    SELECT
        models.*
    FROM
        companies,
        UNNEST(
            companies.models AS name WITH ORDINALITY AS index
        ) AS models
    +---------+------------+
    | index   | name       |
    |---------+------------|
    | 1       | Prius      |
    | 2       | Highlander |
    | 3       | Camry      |
    | 1       | Focus      |
    | 2       | Mustang    |
    | 3       | F-150      |
    +---------+------------+

    When the unnested array contains objects, the values returned by the aliased field are objects, and so they can be dereferenced using the usual dot-notation syntax.

    SELECT
        offices.value.state,
        ARRAY_AGG(offices.value.city) AS cities
    FROM
        companies,
        UNNEST(companies.offices AS value) AS offices
    GROUP BY
        offices.value.state
    +---------------------------+---------+
    | cities                    | state   |
    |---------------------------+---------|
    | ['Denver']                | CO      |
    | ['Jacksonville']          | FL      |
    | ['Dearborn', 'Ann Arbor'] | MI      |
    +---------------------------+---------+

    If all elements of an array are objects, you can also omit the UNNEST alias, which allows you to query object's fields directly. Note that this doesn't work for non-object array elements; those will produce empty rows in the UNNESTed table.

    SELECT
        offices.state,
        ARRAY_AGG(offices.city) AS cities
    FROM
        companies,
        UNNEST(companies.offices) AS offices
    GROUP BY
        offices.state
    +---------------------------+---------+
    | cities                    | state   |
    |---------------------------+---------|
    | ['Denver']                | CO      |
    | ['Jacksonville']          | FL      |
    | ['Dearborn', 'Ann Arbor'] | MI      |
    +---------------------------+---------+

    JOIN

    Rockset supports standard SQL joins.

    • An INNER JOIN returns only results where there is a match on both sides.
    • A LEFT (or RIGHT) OUTER JOIN returns all results from the left (or right) side, populating the other side's columns when possible and with null otherwise.
    • A CROSS JOIN returns the Cartesian product of the two sides' results (i.e all combinations of documents in two collections).
    SELECT
        collection1.name,
        collection2.name
    FROM
        collection1
        JOIN collection2 ON collection1._id = collection2._id

    Cross joins can either be specified using the explicit CROSS JOIN syntax or by specifying multiple collections in the FROM clause. The queries below are equivalent.

    SELECT
        collection1.id,
        collection2.id
    FROM
        collection1,
        collection2
    SELECT
        collection1.id,
        collection2.id
    FROM
        collection1
        CROSS JOIN collection2

    TABLESAMPLE

    Rockset only supports BERNOULLI sampling.

    SELECT
        id,
        name
    FROM
        collection1 TABLESAMPLE BERNOULLI (50)

    This returns approximately 50% of the results that would have been otherwise obtained.

    SELECT
        avg(height)
    FROM
        people TABLESAMPLE BERNOULLI (25)

    This returns the average of the height field based on approximately 25% of rows in the table.

    GROUP BY Clause

    Often, interesting results are obtained by grouping records together on some similar attributes.

    GROUP BY is the mechanism by which the result of a SELECT is partitioned based on a particular field.

    Non-aggregated fields can only be included in aggregate functions in select expressions with the presense of a GROUP BY clause.

    A simple GROUP BY clause may contain any expression composed of input columns.

    SELECT
        count(*) AS c,
        label
    FROM
        _events
    GROUP BY
        label
    +--------+------------------------------+
    | c      | label                        |
    |--------+------------------------------|
    | 41     | API_KEY_DELETED              |
    | 35     | API_KEY_CREATED              |
    | 88256  | QUERY_SUCCESS                |
    | 133    | QUERY_INVALID                |
    | 24     | API_KEY_ERROR                |
    | 72     | INGEST_WARNING               |
    | 16     | COLLECTION_DROPPED           |
    | 121988 | INGEST_INFO                  |
    +--------+------------------------------+

    HAVING is used to filter results, based on the output of aggregate functions.

    SELECT
        count(*) AS c,
        label
    FROM
        _events
    GROUP BY
        label
    HAVING
        count(*) < 50
    +--------+------------------------------+
    | c      | label                        |
    |--------+------------------------------|
    | 41     | API_KEY_DELETED              |
    | 35     | API_KEY_CREATED              |
    | 24     | API_KEY_ERROR                |
    | 16     | COLLECTION_DROPPED           |
    +--------+------------------------------+

    For more complex aggregations, Rockset supports GROUPING SETS, CUBE and ROLLUP. These can be used to perform aggregations on multiple fields in a single query.

    Functions are not supported in GROUP BY clause. Consider using a subquery instead.

    GROUPING SETS

    Grouping sets allow users to specify multiple lists of fields to group on. The fields that are not part of a given sublist of grouping fields are set to null. It can logically be thought of as the UNION of a single SELECT query with a simple GROUP BY for each GROUPING SET.

    SELECT
        label,
        kind
    FROM
        _events
    GROUP BY
        GROUPING SETS ((label), (label, kind))
    +--------------+------------------------------+
    | kind         | label                        |
    |--------------+------------------------------|
    | null         | API_KEY_DELETED              |
    | API_KEY      | API_KEY_DELETED              |
    | COLLECTION   | COLLECTION_CREATED           |
    | null         | COLLECTION_CREATED           |
    | QUERY        | QUERY_SUCCESS                |
    | QUERY        | QUERY_COLLECTION_NOT_READY   |
    | INGEST       | INGEST_INFO                  |
    | INGEST       | INGEST_INITIALIZED           |
    +--------------+------------------------------+

    CUBE

    CUBE is a higher level operator that can be used to generate GROUPING SETS. It generates all possible GROUPING SETS given a set of fields.

    SELECT
        label,
        kind
    FROM
        _events
    GROUP BY
        CUBE (label, kind)

    This is equivalent to:

    SELECT
        label,
        kind
    FROM
        _events
    GROUP BY
        GROUPING SETS (
            (label, kind),
            (label),
            (kind),
            ()
        )

    ROLLUP

    The ROLLUP operator is similar to CUBE in that it's a higher order abstraction over GROUPING SETS. It generates all possible starting subsequences for a given set of fields.

    SELECT
        label,
        kind
    FROM
        _events
    GROUP BY
        ROLLUP (label, kind)

    This is equivalent to:

    SELECT
        label,
        kind
    FROM
        _events
    GROUP BY
        GROUPING SETS ((label, kind), (label), ())

    ORDER BY Clause

    The ORDER BY clause is used to sort a result set by one or more output expressions. By default ordering for an expression is ascending. Use DESC to get result set in descending order.

    SELECT
        *
    FROM
        users
    ORDER BY
        users.username DESC

    Use multiple expressions, each with a different ordering, by separating with commas.

    SELECT
        *
    FROM
        users
    ORDER BY
        users.first_name ASC,
        users.last_name DESC

    LIMIT Clause

    The LIMIT clause restricts the number of rows in the result.

    The following query limits the result set to contain a maximum of 10 rows. The result set represents the top 10 customers based on customer rating.

    SELECT
        *
    FROM
        customers
    ORDER BY
        customers.customer_rating DESC
    LIMIT
        10

    OFFSET Clause

    The OFFSET clause skips the first n rows of results.

    The following query selects the 6th through the 15th rows.

    SELECT
        *
    FROM
        customers
    ORDER BY
        customers.customer_rating DESC
    LIMIT
        10 OFFSET 5

    CLUSTER BY Clause

    The CLUSTER BY clause is used in ingest transformations to configure data clustering on a collection. It is not allowed during normal query execution.

    Similar to the GROUP BY clause, you can reference fields according to their ordinal (1, 2, etc.) or directly by name. Unlike the GROUP BY clause, you may not cluster by expressions, but can only directly reference fields from the SELECT clause.

    For example,

    SELECT
        email,
        country,
        occupation,
        age,
        income
    FROM
        _input CLUSTER BY country,
        occupation

    would define clustering on fields (country, occupation) in that order, which would make queries that have predicates on either country, or both country and occupation (but not just occupation!) much faster. The equivalent ordinal-based transformation would be

    SELECT
        email,
        country,
        occupation,
        age,
        income
    FROM
        _input CLUSTER BY 2,
        3

    Combining Result Sets

    UNION returns the combination of results of two SELECT queries.

    SELECT
        id,
        name
    FROM
        collection1
    UNION
    SELECT
        userId,
        name
    FROM
        collection2

    INTERSECT returns only distinct common results of two SELECT queries.

    SELECT
        id,
        name
    FROM
        collection1
    INTERSECT
    SELECT
        userId,
        name
    FROM
        collection2

    EXCEPT returns only distinct results of the first query that are not in the second query.

    SELECT
        id,
        name
    FROM
        collection1
    EXCEPT
    SELECT
        userId,
        name
    FROM
        collection2

    INSERT INTO

    INSERT INTO collection_name
    [ (column_alias [, ...]) ]
    select_query

    Inserts result of a query into a collection.

    • collection_name is the collection into which query results are ingested.
    • column_alias renames fields in the result rows. If specified, this list must contain as many items as the number of fields selected by the select_query
    • select_query is a valid SELECT command.

    Note that ingestion of query results into the specified collection begins after SELECT command finishes successfully. The ingestion happens in the background and for a query that finished successfully it is guaranteed to finish eventually. Query results are guaranteed to be ingested completely into the specified collection eventually.

    Below we cover common uses of INSERT INTO.

    To update some documents in a collection. Note that following queries select _id column, so a document with specified id gets updated.

    INSERT INTO
        collection1
    SELECT
        _id,
        price + 10 AS price
    FROM
        collection1
    where
        price < 5
    INSERT INTO
        collection1
    SELECT
        _id,
        NULL AS age
    FROM
        collection1

    To replicate a collection:

    INSERT INTO
        collection2
    SELECT
        *
    FROM
        collection1