Rockset

    Aggregate Functions

    This page covers functions that can be used in aggregation queries.

    All aggregate functions take arguments which are expressions potentially involving an identifier. The functions operate on a number of values and return a single result. All single argument functions support a DISTINCT option, as in COUNT(DISTINCT x), where all the duplicate values are removed from the input set.

    SELECT
        COUNT(is_even)
    FROM
        numbers -- collection with is_even values [true, true, false, false, true]
    5
    SELECT
        COUNT(DISTINCT is_even)
    FROM
        numbers
    2

    General Aggregations

    ARRAY_AGG

    ARRAY_AGG(x)

    Returns an array created from all the elements in x.

    This function accepts a set of values and returns an array in which each value in the set is assigned to an element of the array.

    Use LIMIT to enforce the maximum number of elements in the array. Use ORDER BY to specify how elements should be ordered in the array. Use OFFSET to add an offset to the ordered array result.

    WITH example AS (
        SELECT
            'red' AS color,
            'primary' AS type
        UNION ALL
        SELECT
            'blue',
            'primary'
        UNION ALL
        SELECT
            'yellow',
            'secondary'
        UNION ALL
        SELECT
            'green',
            'primary'
        UNION ALL
        SELECT
            'white',
            'mix'
        UNION ALL
        SELECT
            'black',
            null
    )
    SELECT
        t.type,
        ARRAY_AGG(t.color) AS all_colors
    FROM
        example t
    GROUP BY
        t.type
    +-----------+--------------------------+
    | type      | all_colors               |
    |-----------+--------------------------|
    | primary   | ["red", "blue", "green"] |
    | mix       | ["white"]                |
    | secondary | ["yellow"]               |
    | null      | ["black"]                |
    +-----------+--------------------------+
    WITH example AS (
        SELECT
            'blue' AS color,
            'primary' AS type,
            undefined AS weight,
        UNION ALL
        SELECT
            'red',
            'primary',
            5
        UNION ALL
        SELECT
            'yellow',
            'secondary',
            8
        UNION ALL
        SELECT
            'green',
            'primary',
            1
        UNION ALL
        SELECT
            'white',
            'mix',
            9
        UNION ALL
        SELECT
            'purple',
            'mix',
            10
        UNION ALL
        SELECT
            'black',
            undefined,
            3
    )
    SELECT
        ARRAY_AGG(
            color
            ORDER BY
                type ASC NULLS FIRST,
                weight DESC
            LIMIT
                5 OFFSET 1
        ) AS all_colors
    FROM
        example
    +---------------------------------------------+
    | all_colors                                  |
    |---------------------------------------------|
    | ["purple", "white", "blue", "red", "green"] |
    +---------------------------------------------+

    MAP_AGG

    MAP_AGG(keys, values)

    Creates an object where the keys are from the first input and the values are from the second input.

    WITH fruits AS (
        SELECT
            'apple' as key,
            'red' as value
        UNION ALL
        SELECT
            'banana' AS key,
            'yellow' as value
    )
    SELECT
        MAP_AGG(key, value)
    FROM
        fruits
    {"apple":"red","banana":"yellow"}

    ARBITRARY

    ARBITRARY(x)

    Returns an arbitrary non-null element from the input. Returns null if the input is empty or all null.

    This function returns an arbitrary element. It will return some non-null element from the input. It only returns null if there are no inputs or if all inputs are null.

    WITH example AS (
        SELECT
            ['red'] AS color
        UNION ALL
        SELECT
            ['blue']
        UNION ALL
        SELECT
            ['yellow']
        UNION ALL
        SELECT
            ['green']
        UNION ALL
        SELECT
            ['white']
        UNION ALL
        SELECT
            ['black']
    )
    SELECT
        arbitrary(t.color) AS arbitrary_color
    FROM
        example t
    ["red"]

    COUNT

    COUNT(*)

    Returns the number of input rows.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        COUNT(name)
    FROM
        examples
    4
    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        COUNT(*)
    FROM
        examples
    5
    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Aaron'
        UNION ALL
        SELECT
            'james'
    )
    SELECT
        UPPER(SUBSTR(name, 1, 1)) first_letter,
        COUNT(*) count
    FROM
        examples
    GROUP BY
        first_letter
    +--------------+-----------+
    | first_letter | count     |
    |--------------+-----------|
    | J            | 3         |
    | A            | 2         |
    +--------------+-----------+

    APPROX_DISTINCT

    APPROX_DISTINCT(x[, e])

    Returns the approximate number of distinct elements with a non-null value for field x.

    This function calcuates an approximate count of the number of distinct values. It is similar to COUNT(DISTINCT x), but it consumes a bounded amount of memory while DISTINCT consumes memory linear in the number of distinct items being counted.

    Note: The result it yields is not guaranteed to be exact. If the parameter e is specified, the standard error of the estimate will be no more than e. If it is not specified, e will default to 1%. e must be in the range [0.0011, 0.26]. APPROX_DISTINCT is implemented with the improved estimator for HyperLogLog proposed by Otmar Ertl.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'John'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            'Angela'
    )
    SELECT
        APPROX_DISTINCT(name)
    FROM
        examples
    2

    GROUPING

    GROUPING(col1, col1, ... col_n)

    Returns the grouping mask, which is a bitmask associating one bit with every column (the first column in the list of arguments corresponds to the most significant bit in the result).

    A column's bit is 0 if the column is included in the current grouping and 1 if it is not included.

    Most useful with GROUPING SETS, ROLLUP, or CUBE.

    /*
    Using GROUPING_SETS((), (x), (y), (x,y)) leads to 4 aggregations,
    one by x, one by y, one by the pair (x, y) and one by () which means
    all rows. The groupId will identify which of these aggregations the
    sum belongs to. The groupIds are:
      3– in binary '11' -> x=1, y=1 -> grouped by ()
      2- in binary '10' -> x=1, y=0 -> grouped by y
      1- in binary '01' -> x=0, y=1 -> grouped by x
      0- in binary '00' -> x=0, y=0 -> grouped by (x, y)
    */
    WITH examples AS (
        SELECT
            1 AS x,
            'a' AS y
        UNION ALL
        SELECT
            2,
            'b'
        UNION ALL
        SELECT
            1,
            'c'
        UNION ALL
        SELECT
            5,
            'a'
    )
    SELECT
        GROUPING(x, y) group_id,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        GROUPING SETS((), (x), (y), (x, y))
    ORDER BY
        group_id DESC
    +-----------+-------+
    | group_id  | sum   |
    |-----------+-------|
    | 3         | 9     |
    | 2         | 1     |
    | 2         | 6     |
    | 2         | 2     |
    | 1         | 2     |
    | 1         | 2     |
    | 1         | 5     |
    | 0         | 2     |
    | 0         | 1     |
    | 0         | 5     |
    | 0         | 1     |
    +-----------+-------+
    /*
    This is equivalent to the example above but
    uses the command CUBE instead of GROUPING SETS
    */
    WITH examples AS (
        SELECT
            1 AS x,
            'a' AS y
        UNION ALL
        SELECT
            2,
            'b'
        UNION ALL
        SELECT
            1,
            'c'
        UNION ALL
        SELECT
            5,
            'a'
    )
    SELECT
        GROUPING(x, y) group_id,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        CUBE(x, y)
    ORDER BY
        group_id DESC
    +-----------+-------+
    | group_id  | sum   |
    |-----------+-------|
    | 3         | 9     |
    | 2         | 1     |
    | 2         | 6     |
    | 2         | 2     |
    | 1         | 2     |
    | 1         | 2     |
    | 1         | 5     |
    | 0         | 2     |
    | 0         | 1     |
    | 0         | 5     |
    | 0         | 1     |
    +-----------+-------+
    /*
    Almost identical to the previous two examples, but
    there is no group (y) since ROLLUP only groups by
    subsequences of the provided columns, so in this case
    (x, y), (x), and ().
    */
    WITH examples AS (
        SELECT
            1 AS x,
            'a' AS y
        UNION ALL
        SELECT
            2,
            'b'
        UNION ALL
        SELECT
            1,
            'c'
        UNION ALL
        SELECT
            5,
            'a'
    )
    SELECT
        GROUPING(x, y) group_id,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        ROLLUP(x, y)
    ORDER BY
        group_id DESC
    +-----------+-------+
    | group_id  | sum   |
    |-----------+-------|
    | 3         | 9     |
    | 1         | 2     |
    | 1         | 2     |
    | 1         | 5     |
    | 0         | 2     |
    | 0         | 1     |
    | 0         | 5     |
    | 0         | 1     |
    +-----------+-------+

    Logical Aggregations

    BITWISE_AND_AGG

    BITWISE_AND_AGG(x)

    Returns the bitwise AND of all input values in 2's complement representation.

    WITH examples AS (
        SELECT
            1 AS x -- 00001
        UNION ALL
        SELECT
            2 -- 00010
        UNION ALL
        SELECT
            4 -- 00100
        UNION ALL
        SELECT
            8 -- 01000
        UNION ALL
        SELECT
            16 -- 10000
    )
    SELECT
        BITWISE_AND_AGG(x)
    FROM
        examples
    0

    BITWISE_OR_AGG

    BITWISE_OR_AGG(x)

    Returns the bitwise OR of all input values in 2's complement representation.

    WITH examples AS (
        SELECT
            1 AS x -- 00001
        UNION ALL
        SELECT
            2 -- 00010
        UNION ALL
        SELECT
            4 -- 00100
        UNION ALL
        SELECT
            8 -- 01000
        UNION ALL
        SELECT
            16 -- 10000
    )
    SELECT
        BITWISE_OR_AGG(x)
    FROM
        examples
    31

    BOOL_AND

    BOOL_AND(x)

    Returns true if every value in the input is true, false otherwise. Return value and all arguments are boolean.

    WITH examples AS (
        SELECT
            true AS x
        UNION ALL
        SELECT
            true
        UNION ALL
        SELECT
            true
    )
    SELECT
        BOOL_AND(x)
    FROM
        examples
    true
    WITH examples AS (
        SELECT
            true AS x
        UNION ALL
        SELECT
            true
        UNION ALL
        SELECT
            false
    )
    SELECT
        BOOL_AND(x)
    FROM
        examples
    false

    BOOL_OR

    BOOL_OR(x)

    Returns true if one value in the input is true, false otherwise. Return value and all arguments are boolean.

    WITH examples AS (
        SELECT
            false AS x
        UNION ALL
        SELECT
            false
        UNION ALL
        SELECT
            false
    )
    SELECT
        BOOL_OR(x)
    FROM
        examples
    false
    WITH examples AS (
        SELECT
            false AS x
        UNION ALL
        SELECT
            false
        UNION ALL
        SELECT
            true
    )
    SELECT
        BOOL_OR(x)
    FROM
        examples
    true

    COUNT_IF

    COUNT_IF(x)

    Returns the number of elements in x which are true.

    WITH examples AS (
        SELECT
            false AS x
        UNION ALL
        SELECT
            false
        UNION ALL
        SELECT
            true
    )
    SELECT
        COUNT_IF(x)
    FROM
        examples
    1

    EVERY

    EVERY(x)

    An alias of BOOL_AND function.

    Mathematical Aggregations

    AVG

    AVG(x)

    Returns an average of all the elements in x. All elements are implicitly promoted to float. Return value is of type float.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            5
        UNION ALL
        SELECT
            10
    )
    SELECT
        AVG(x)
    FROM
        examples
    6.0
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        AVG(x) avg
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | avg  |
    |-------+------|
    | True  | 6.5  |
    | False | 3.0  |
    +-------+------+

    MAX

    MAX(x)

    Returns the maximum value of all elements in x.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            10
    )
    SELECT
        MAX(x)
    FROM
        examples
    10
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        MAX(x) max
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | max  |
    |-------+------|
    | True  | 10   |
    | False | 5    |
    +-------+------+

    MAX_BY

    MAX_BY(x, y)

    Returns the value of column x associated with the maximum value of column y.

    WITH examples AS (
        SELECT
            3 AS x,
            5 AS y
        UNION ALL
        SELECT
            1,
            10
        UNION ALL
        SELECT
            10,
            2
    )
    SELECT
        MAX_BY(x, y)
    FROM
        examples
    1

    MIN

    MIN(x)

    Returns the minimum value of all elements in x.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            10
    )
    SELECT
        MIN(x)
    FROM
        examples
    3
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        MIN(x) min
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | min  |
    |-------+------|
    | True  | 3    |
    | False | 1    |
    +-------+------+

    MIN_BY

    MIN_BY(x, y)

    Returns the value of column x associated with the minimum value of column y.

    WITH examples AS (
        SELECT
            3 AS x,
            5 AS y
        UNION ALL
        SELECT
            1,
            10
        UNION ALL
        SELECT
            10,
            2
    )
    SELECT
        MIN_BY(x, y)
    FROM
        examples
    10

    STDDEV_SAMP

    STDDEV_SAMP(x)

    Returns the sample standard deviation of all input values.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null -- ignored in STDEV calculation
        UNION ALL
        SELECT
            10
    )
    SELECT
        STDDEV_SAMP(x)
    FROM
        examples
    4.949747468305833

    SUM

    SUM(x)

    Returns the sum of all elements in x. Returns a value of type int if all of the input elements are int, float otherwise.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null -- ignored in SUM calculation
        UNION ALL
        SELECT
            10
    )
    SELECT
        SUM(x)
    FROM
        examples
    13
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | sum  |
    |-------+------|
    | True  | 13   |
    | False | 6    |
    +-------+------+

    Histogram Map Aggregations

    Note: Histogram map aggregations can only be used in rollups.

    A histogram map aggregate consists of two components:

    1. An object which maps from keys to values
    2. An object which maps from value hash to the number of times the value appears in the key-value map

    Therefore, a histogram map aggregate is a regular key-value map supplemented with a value histogram. This means that in addition to key lookups, histogram map aggregates provide the ability to efficiently look up the number of times a given value appears in the key-value map.

    HMAP_AGG

    HMAP_AGG(key, value, op)

    Creates a histogram map aggregate. For each input document, the op argument specifies the update semantics for the given key value pair.

    The op argument accepts the following string values (case-insensitive):

    • 'UPSERT': Adds the key value pair to the object if the key does not already exist in the object. Overwrites the value for the key otherwise.
    • 'INSERT': Adds the key value pair to the object if the key does not already exist in the object. Does not overwrite the value if the key already exists in the object.
    • 'UPDATE': Updates the value for the key if the key already exists in the object. Does not add the key value pair if the key does not already exist in the object.
    • 'DELETE': Deletes the key from the object. If the key does not already exist in the object, this is a no-op.
    • 'IGNORE': Ignore this update completely.

    The op argument is optional. If it is omitted or set to null or undefined, 'UPSERT' semantics are used.

    Suppose we have the following input documents for our rollup collection fruits:

    {"fruit": "apple", "color": "green", "op": "INSERT"},
    {"fruit": "banana", "color": "yellow", "op": "UPSERT"},
    {"fruit": "grape", "color": "purple"},
    {"fruit": "orange", "color": "orange", "op": "DELETE"},
    {"fruit": "apple", "color": "pink"},
    {"fruit": "apple", "color": "red", "op": "INSERT"},
    {"fruit": "pear", "color": "green", "op": "UPDATE"},
    {"fruit": "grape", "color": "green", "op": "DELETE"},
    {"fruit": "banana", "color": "brown", "op": "UPDATE"},
    {"fruit": "pear", "color": "brown", "op": "INSERT"}

    Given the rollup query:

    SELECT
        HMAP_AGG(fruit, color, op) hmap
    FROM
        _input

    The key-value map component of the created histogram map aggregate will be as follows:

    {
      {"fruit": "apple", "color": "pink"},
      {"fruit": "banana", "color": "brown"},
      {"fruit": "pear", "color": "brown"}
    }

    The value histogram component of the created histogram map aggregate can be queried to see that the value 'pink' occurs once and the value 'brown' occurs twice in the key-value map.

    HMAP_ELEMENT_AT

    HMAP_ELEMENT_AT(hmap, k)

    Returns the value associated with key k in the histogram map aggregate hmap.

    Using the same rollup example as above,

    SELECT
        HMAP_ELEMENT_AT(hmap, 'apple')
    FROM
        fruits
    'pink'
    SELECT
        HMAP_ELEMENT_AT(hmap, 'strawberry')
    FROM
        fruits
    { "__rockset_type": "undefined" }

    HMAP_VALUE_COUNT

    HMAP_VALUE_COUNT(hmap, v)

    Returns the number of times value v occurs in the histogram map aggregate hmap.

    Using the same rollup example as above,

    SELECT
        HMAP_VALUE_COUNT(hmap, 'brown')
    FROM
        fruits
    2
    SELECT
        HMAP_VALUE_COUNT(hmap, 'blue')
    FROM
        fruits
    0

    HMAP_CONTAINS_VALUE

    HMAP_CONTAINS_VALUE(hmap, v)

    Returns whether value v occurs in the histogram map aggregate hmap.

    Using the same rollup example as above,

    SELECT
        HMAP_CONTAINS_VALUE(hmap, 'pink')
    FROM
        fruits
    true
    SELECT
        HMAP_CONTAINS_VALUE(hmap, 'orange')
    FROM
        fruits
    false