Rockset

    Aggregate Functions

    This page covers functions that can be used in aggregation queries.

    All aggregate functions take arguments which are expressions potentially involving an identifier. The functions operate on a number of values and return a single result. All single argument functions support a DISTINCT option, as in COUNT(DISTINCT x), where all the duplicate values are removed from the input set.

    SELECT
        COUNT(is_even)
    FROM
        numbers -- collection with is_even values [true, true, false, false, true]
    5
    SELECT
        COUNT(DISTINCT is_even)
    FROM
        numbers
    2

    General Aggregations

    ARRAY_AGG

    ARRAY_AGG(x)

    Returns an array created from all the elements in x.

    This function accepts a set of values and returns an array in which each value in the set is assigned to an element of the array.

    WITH example AS (
        SELECT
            'red' AS color,
            'primary' AS type
        UNION ALL
        SELECT
            'blue',
            'primary'
        UNION ALL
        SELECT
            'yellow',
            'secondary'
        UNION ALL
        SELECT
            'green',
            'primary'
        UNION ALL
        SELECT
            'white',
            'mix'
        UNION ALL
        SELECT
            'black',
            null
    )
    SELECT
        t.type,
        ARRAY_AGG(t.color) AS all_colors
    FROM
        example t
    GROUP BY
        t.type
    +-----------+--------------------------+
    | type      | all_colors               |
    |-----------+--------------------------|
    | primary   | ["red", "blue", "green"] |
    | mix       | ["white"]                |
    | secondary | ["yellow"]               |
    | null      | ["black"]                |
    +-----------+--------------------------+

    MAP_AGG

    MAP_AGG(keys, values)

    Creates an object where the keys are from the first input and the values are from the second input.

    WITH fruits AS (
        SELECT
            'apple' as key,
            'red' as value
        UNION ALL
        SELECT
            'banana' AS key,
            'yellow' as value
    )
    SELECT
        MAP_AGG(key, value)
    FROM
        fruits
    {"apple":"red","banana":"yellow"}

    ARBITRARY

    ARBITRARY(x)

    Returns an arbitrary non-null element from the input. Returns null if the input is empty or all null.

    This function returns an arbitrary element. It will return some non-null element from the input. It only returns null if there are no inputs or if all inputs are null.

    WITH example AS (
        SELECT
            ['red'] AS color
        UNION ALL
        SELECT
            ['blue']
        UNION ALL
        SELECT
            ['yellow']
        UNION ALL
        SELECT
            ['green']
        UNION ALL
        SELECT
            ['white']
        UNION ALL
        SELECT
            ['black']
    )
    SELECT
        arbitrary(t.color) AS arbitrary_color
    FROM
        example t
    ["red"]

    COUNT

    COUNT(*)

    Returns the number of input rows.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        COUNT(name)
    FROM
        examples
    4
    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        COUNT(*)
    FROM
        examples
    5
    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Aaron'
        UNION ALL
        SELECT
            'james'
    )
    SELECT
        UPPER(SUBSTR(name, 1, 1)) first_letter,
        COUNT(*) count
    FROM
        examples
    GROUP BY
        first_letter
    +--------------+-----------+
    | first_letter | count     |
    |--------------+-----------|
    | J            | 3         |
    | A            | 2         |
    +--------------+-----------+

    APPROX_DISTINCT

    APPROX_DISTINCT(x[, e])

    Returns the approximate number of distinct elements with a non-null value for field x.

    This function calcuates an approximate count of the number of distinct values. It is similar to COUNT(DISTINCT x), but it consumes a bounded amount of memory while DISTINCT consumes memory linear in the number of distinct items being counted.

    Note: The result it yields is not guaranteed to be exact. If the parameter e is specified, the standard error of the estimate will be no more than e. If it is not specified, e will default to 1%. e must be in the range [0.0011, 0.26]. APPROX_DISTINCT is implemented with the improved estimator for HyperLogLog proposed by Otmar Ertl.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'John'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            'Angela'
    )
    SELECT
        APPROX_DISTINCT(name)
    FROM
        examples
    2

    GROUPING

    GROUPING(col1, col1, ... col_n)

    Returns the grouping mask, which is a bitmask associating one bit with every column (the first column in the list of arguments corresponds to the most significant bit in the result).

    A column's bit is 0 if the column is included in the current grouping and 1 if it is not included.

    Most useful with GROUPING SETS, ROLLUP, or CUBE.

    /*
    Using GROUPING_SETS((), (x), (y), (x,y)) leads to 4 aggregations,
    one by x, one by y, one by the pair (x, y) and one by () which means
    all rows. The groupId will identify which of these aggregations the
    sum belongs to. The groupIds are:
      3– in binary '11' -> x=1, y=1 -> grouped by ()
      2- in binary '10' -> x=1, y=0 -> grouped by y
      1- in binary '01' -> x=0, y=1 -> grouped by x
      0- in binary '00' -> x=0, y=0 -> grouped by (x, y)
    */
    WITH examples AS (
        SELECT
            1 AS x,
            'a' AS y
        UNION ALL
        SELECT
            2,
            'b'
        UNION ALL
        SELECT
            1,
            'c'
        UNION ALL
        SELECT
            5,
            'a'
    )
    SELECT
        GROUPING(x, y) group_id,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        GROUPING SETS((), (x), (y), (x, y))
    ORDER BY
        group_id DESC
    +-----------+-------+
    | group_id  | sum   |
    |-----------+-------|
    | 3         | 9     |
    | 2         | 1     |
    | 2         | 6     |
    | 2         | 2     |
    | 1         | 2     |
    | 1         | 2     |
    | 1         | 5     |
    | 0         | 2     |
    | 0         | 1     |
    | 0         | 5     |
    | 0         | 1     |
    +-----------+-------+
    /*
    This is equivalent to the example above but
    uses the command CUBE instead of GROUPING SETS
    */
    WITH examples AS (
        SELECT
            1 AS x,
            'a' AS y
        UNION ALL
        SELECT
            2,
            'b'
        UNION ALL
        SELECT
            1,
            'c'
        UNION ALL
        SELECT
            5,
            'a'
    )
    SELECT
        GROUPING(x, y) group_id,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        CUBE(x, y)
    ORDER BY
        group_id DESC
    +-----------+-------+
    | group_id  | sum   |
    |-----------+-------|
    | 3         | 9     |
    | 2         | 1     |
    | 2         | 6     |
    | 2         | 2     |
    | 1         | 2     |
    | 1         | 2     |
    | 1         | 5     |
    | 0         | 2     |
    | 0         | 1     |
    | 0         | 5     |
    | 0         | 1     |
    +-----------+-------+
    /*
    Almost identical to the previous two examples, but
    there is no group (y) since ROLLUP only groups by
    subsequences of the provided columns, so in this case
    (x, y), (x), and ().
    */
    WITH examples AS (
        SELECT
            1 AS x,
            'a' AS y
        UNION ALL
        SELECT
            2,
            'b'
        UNION ALL
        SELECT
            1,
            'c'
        UNION ALL
        SELECT
            5,
            'a'
    )
    SELECT
        GROUPING(x, y) group_id,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        ROLLUP(x, y)
    ORDER BY
        group_id DESC
    +-----------+-------+
    | group_id  | sum   |
    |-----------+-------|
    | 3         | 9     |
    | 1         | 2     |
    | 1         | 2     |
    | 1         | 5     |
    | 0         | 2     |
    | 0         | 1     |
    | 0         | 5     |
    | 0         | 1     |
    +-----------+-------+

    Logical Aggregations

    BITWISE_AND_AGG

    BITWISE_AND_AGG(x)

    Returns the bitwise AND of all input values in 2's complement representation.

    WITH examples AS (
        SELECT
            1 AS x -- 00001
        UNION ALL
        SELECT
            2 -- 00010
        UNION ALL
        SELECT
            4 -- 00100
        UNION ALL
        SELECT
            8 -- 01000
        UNION ALL
        SELECT
            16 -- 10000
    )
    SELECT
        BITWISE_AND_AGG(x)
    FROM
        examples
    0

    BITWISE_OR_AGG

    BITWISE_OR_AGG(x)

    Returns the bitwise OR of all input values in 2's complement representation.

    WITH examples AS (
        SELECT
            1 AS x -- 00001
        UNION ALL
        SELECT
            2 -- 00010
        UNION ALL
        SELECT
            4 -- 00100
        UNION ALL
        SELECT
            8 -- 01000
        UNION ALL
        SELECT
            16 -- 10000
    )
    SELECT
        BITWISE_OR_AGG(x)
    FROM
        examples
    31

    BOOL_AND

    BOOL_AND(x)

    Returns true if every value in the input is true, false otherwise. Return value and all arguments are boolean.

    WITH examples AS (
        SELECT
            true AS x
        UNION ALL
        SELECT
            true
        UNION ALL
        SELECT
            true
    )
    SELECT
        BOOL_AND(x)
    FROM
        examples
    true
    WITH examples AS (
        SELECT
            true AS x
        UNION ALL
        SELECT
            true
        UNION ALL
        SELECT
            false
    )
    SELECT
        BOOL_AND(x)
    FROM
        examples
    false

    BOOL_OR

    BOOL_OR(x)

    Returns true if one value in the input is true, false otherwise. Return value and all arguments are boolean.

    WITH examples AS (
        SELECT
            false AS x
        UNION ALL
        SELECT
            false
        UNION ALL
        SELECT
            false
    )
    SELECT
        BOOL_OR(x)
    FROM
        examples
    false
    WITH examples AS (
        SELECT
            false AS x
        UNION ALL
        SELECT
            false
        UNION ALL
        SELECT
            true
    )
    SELECT
        BOOL_OR(x)
    FROM
        examples
    true

    COUNT_IF

    COUNT_IF(x)

    Returns the number of elements in x which are true.

    WITH examples AS (
        SELECT
            false AS x
        UNION ALL
        SELECT
            false
        UNION ALL
        SELECT
            true
    )
    SELECT
        COUNT_IF(x)
    FROM
        examples
    1

    EVERY

    EVERY(x)

    An alias of BOOL_AND function.

    Mathematical Aggregations

    AVG

    AVG(x)

    Returns an average of all the elements in x. All elements are implicitly promoted to float. Return value is of type float.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            5
        UNION ALL
        SELECT
            10
    )
    SELECT
        AVG(x)
    FROM
        examples
    6.0
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        AVG(x) avg
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | avg  |
    |-------+------|
    | True  | 6.5  |
    | False | 3.0  |
    +-------+------+

    MAX

    MAX(x)

    Returns the maximum value of all elements in x.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            10
    )
    SELECT
        MAX(x)
    FROM
        examples
    10
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        MAX(x) max
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | max  |
    |-------+------|
    | True  | 10   |
    | False | 5    |
    +-------+------+

    MAX_BY

    MAX_BY(x, y)

    Returns the value of column x associated with the maximum value of column y.

    WITH examples AS (
        SELECT
            3 AS x,
            5 AS y
        UNION ALL
        SELECT
            1,
            10
        UNION ALL
        SELECT
            10,
            2
    )
    SELECT
        MAX_BY(x, y)
    FROM
        examples
    1

    MIN

    MIN(x)

    Returns the minimum value of all elements in x.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null
        UNION ALL
        SELECT
            10
    )
    SELECT
        MIN(x)
    FROM
        examples
    3
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        MIN(x) min
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | min  |
    |-------+------|
    | True  | 3    |
    | False | 1    |
    +-------+------+

    MIN_BY

    MIN_BY(x, y)

    Returns the value of column x associated with the minimum value of column y.

    WITH examples AS (
        SELECT
            3 AS x,
            5 AS y
        UNION ALL
        SELECT
            1,
            10
        UNION ALL
        SELECT
            10,
            2
    )
    SELECT
        MIN_BY(x, y)
    FROM
        examples
    10

    STDDEV_SAMP

    STDDEV_SAMP(x)

    Returns the sample standard deviation of all input values.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null -- ignored in STDEV calculation
        UNION ALL
        SELECT
            10
    )
    SELECT
        STDDEV_SAMP(x)
    FROM
        examples
    4.949747468305833

    SUM

    SUM(x)

    Returns the sum of all elements in x. Returns a value of type int if all of the input elements are int, float otherwise.

    WITH examples AS (
        SELECT
            3 AS x
        UNION ALL
        SELECT
            null -- ignored in SUM calculation
        UNION ALL
        SELECT
            10
    )
    SELECT
        SUM(x)
    FROM
        examples
    13
    WITH examples AS (
        SELECT
            3 AS x,
            true AS y
        UNION ALL
        SELECT
            5,
            false
        UNION ALL
        SELECT
            10,
            true
        UNION ALL
        SELECT
            1,
            false
    )
    SELECT
        y,
        SUM(x) sum
    FROM
        examples
    GROUP BY
        y
    +-------+------+
    | y     | sum  |
    |-------+------|
    | True  | 13   |
    | False | 6    |
    +-------+------+