Rockset

    Window Functions

    This page covers functions that can be used in window queries.

    Window functions are similar to aggregate functions, except that they return one row per input row, instead of aggregating all of the rows into a single result. Window functions operate over partititons in the input data, specified by an OVER() clause. Within a partition, window frames can be defined, and the window function can be applied in a given order to these window frames.

    Window function calls are permitted only in the SELECT list and the ORDER BY clause of the query.

    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            2
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        ROW_NUMBER() OVER()
    FROM
        numbers
    +--------------+
    | row_number   |
    |--------------+
    | 1            |
    | 2            |
    | 3            |
    | 4            |
    +--------------+

    Over

    To execute a window function, an OVER() clause is required. An OVER() clause can contain information about the partitions a window function should be applied on, the order in which a window function will operate over a given partition, as well as the window frames over which aggregation or certain window functions should be computed on.

    PARTITION BY

    The PARTITION BY option groups the rows of the query into partitions, which the window function processes separately. Without PARTITION BY, the query is treated as having only one partititon.

    /*
    In this example, we end up with three partitions.
    One partition includes all rows with 1 as the number, one
    includes only the row with 3 as the number value, and one
    with only 4 as the number value.
    */
    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            1
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        number,
        ROW_NUMBER() OVER(PARTITION BY number)
    FROM
        numbers
    +-----------+------------+
    | number    | row_number |
    |-----------+------------|
    | 1         | 1          |
    | 1         | 2          |
    | 3         | 1          |
    | 4         | 1          |
    +-----------+------------+

    ORDER BY

    The ORDER BY option determines the order in which the rows of a partition are processed. It works similarly to a query-level ORDER BY clause. Without ORDER BY, rows are processed in an unspecified order. Note that ORDER BY inside of a window function may not affect the row ordering in the output columns. The only thing ORDER BY guarantees is the order in which the window function is applied across a partition.

    /*
    In this example, we are applying the sum function
    in descending order relative to the number column.
    Note that both rows with number value 1 are treated
    as peers, and are therefore aggregated together.
    */
    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            1
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        number,
        SUM(number) OVER(
            ORDER BY
                number DESC
        )
    FROM
        numbers
    +-----------+-----+
    | number    | sum |
    |-----------+-----|
    | 4         | 4   |
    | 3         | 7   |
    | 1         | 9   |
    | 1         | 9   |
    +-----------+-----+

    RANGE/ROWS BETWEEN

    The window frame specification for the current partition. This is either RANGE or ROWS. This defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up to the current row's last peer. Without ORDER BY, all rows of the partition are included in the window frame. Restrictions are that frame_start cannot be UNBOUNDED FOLLOWING, frame_end cannot be UNBOUNDED PRECEDING, and the frame_end choice cannot appear earlier in the above list than the frame_start choice — for example RANGE BETWEEN CURRENT ROW AND 1 PRECEDING is not allowed.

    /*
    Here, our window frame is rows between 1 preceding and 1 following.
    The COUNT(*) function operates over an independent window frame for
    each row it processes.
    */
    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            1
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        number,
        COUNT(*) OVER(
            ORDER BY
                number ROWS BETWEEN 1 PRECEDING
                AND 1 FOLLOWING
        )
    FROM
        numbers
    +-----------+-------+
    | number    | count |
    |-----------+-------|
    | 1         | 2     |
    | 1         | 3     |
    | 3         | 3     |
    | 4         | 2     |
    +-----------+-------+
    /*
    Here, our window frame is everything in the range
    between the current row and all rows after. Note
    that both rows with number value 1 are treated as
    peer rows, and recieve the same value for count.
    */
    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            1
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        number,
        COUNT(*) OVER(
            ORDER BY
                number RANGE BETWEEN CURRENT ROW
                AND UNBOUNDED FOLLOWING
        )
    FROM
        numbers
    +-----------+-------+
    | number    | count |
    |-----------+-------|
    | 1         | 4     |
    | 1         | 4     |
    | 3         | 2     |
    | 4         | 1     |
    +-----------+-------+
    /*
    Here, our window frame is between 1 preceding
    and implicitly the current row. Note that the
    rows with 1 as the number value are not treated
    as peers.
    */
    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            1
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        number,
        COUNT(*) OVER(
            ORDER BY
                number ROWS 1 PRECEDING
        )
    FROM
        numbers
    +-----------+-------+
    | number    | count |
    |-----------+-------|
    | 1         | 1     |
    | 1         | 2     |
    | 3         | 2     |
    | 4         | 2     |
    +-----------+-------+
    /*
    Here, our window frame is only peer rows to the current row.
    */
    WITH numbers AS (
        SELECT
            1 AS number
        UNION ALL
        SELECT
            1
        UNION ALL
        SELECT
            3
        UNION ALL
        SELECT
            4
    )
    SELECT
        number,
        COUNT(*) OVER(
            ORDER BY
                number RANGE CURRENT ROW
        )
    FROM
        numbers
    +-----------+-------+
    | number    | count |
    |-----------+-------|
    | 1         | 2     |
    | 1         | 2     |
    | 3         | 1     |
    | 4         | 1     |
    +-----------+-------+

    Aggregate Functions

    All aggregate functions can also be used as window functions. When an aggregate function is used as a window function, it aggregates over the rows within the current row's window frame. An aggregate used with ORDER BY and the default window frame definition produces a "running sum" behavior. To obtain aggregation over a row's entire partition, omit the ORDER BY clause entirely or use ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to capture the entire partition.

    Window Functions

    ROW_NUMBER

    ROW_NUMBER()

    Number of the current row within its partition, counting from 1.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        ROW_NUMBER() OVER()
    FROM
        examples
    +--------------+
    | row_number   |
    |--------------+
    | 1            |
    | 2            |
    | 3            |
    | 4            |
    +--------------+

    RANK

    RANK()

    Rank of the current row with gaps; same as row_number of its first peer.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        RANK() OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+------+
    | name      | rank |
    |-----------+------|
    | Angela    | 1    |
    | James     | 2    |
    | Jane      | 3    |
    | John      | 4    |
    +-----------+------+

    DENSE_RANK

    DENSE_RANK()

    Rank of the current row without gaps; this function counts peer groups.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        DENSE_RANK() OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+------------+
    | name      | dense_rank |
    |-----------+------------|
    | Angela    | 1          |
    | Angela    | 1          |
    | James     | 2          |
    | Jane      | 3          |
    | John      | 4          |
    +-----------+------------+

    PERCENT_RANK

    PERCENT_RANK()

    Relative rank of the current row: (rank - 1) / (total rows - 1).

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        PERCENT_RANK() OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+--------------+
    | name      | percent_rank |
    |-----------+--------------|
    | Angela    | 0            |
    | Angela    | 0            |
    | James     | 0.5          |
    | Jane      | 0.75         |
    | John      | 1            |
    +-----------+--------------+

    CUME_DIST

    CUME_DIST()

    Relative rank of the current row: (number of rows preceding or peer with current row) / (total rows).

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        CUME_DIST() OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+------------+
    | name      | cume_dist  |
    |-----------+------------|
    | Angela    | 0.4        |
    | Angela    | 0.4        |
    | James     | 0.6        |
    | Jane      | 0.8        |
    | John      | 1          |
    +-----------+------------+

    NTILE

    NTILE(num_buckets)

    Integer ranging from 1 to the argument value, dividing the partition as equally as possible.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        NTILE(2) OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+-------+
    | name      | ntile |
    |-----------+-------|
    | Angela    | 1     |
    | Angela    | 1     |
    | James     | 1     |
    | Jane      | 2     |
    | John      | 2     |
    +-----------+-------+

    LAG

    LAG(value [, offset [, default ]])

    Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        LAG(name, 2) OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+------------+
    | name      | lag        |
    |-----------+------------|
    | Angela    | null       |
    | Angela    | null       |
    | James     | Angela     |
    | Jane      | Angela     |
    | John      | James      |
    +-----------+------------+

    LEAD

    LEAD(value [, offset [, default ]])

    Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default. Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        LEAD(name, 2) OVER(
            ORDER BY
                name
        )
    FROM
        examples
    +-----------+------------+
    | name      | lead       |
    |-----------+------------|
    | Angela    | James      |
    | Angela    | Jane       |
    | James     | John       |
    | Jane      | null       |
    | John      | null       |
    +-----------+------------+

    FIRST_VALUE

    FIRST_VALUE(value)

    Returns value evaluated at the row that is the first row of the window frame.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        FIRST_VALUE(name) OVER(
            ORDER BY
                name ROWS BETWEEN 1 PRECEDING
                AND 1 FOLLOWING
        )
    FROM
        examples
    +-----------+-------------+
    | name      | first_value |
    |-----------+-------------|
    | Angela    | Angela      |
    | Angela    | Angela      |
    | James     | Angela      |
    | Jane      | James       |
    | John      | Jane        |
    +-----------+-------------+

    LAST_VALUE

    LAST_VALUE(value)

    Returns value evaluated at the row that is the last row of the window frame.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        LAST_VALUE(name) OVER(
            ORDER BY
                name ROWS BETWEEN 1 PRECEDING
                AND 1 FOLLOWING
        )
    FROM
        examples
    +-----------+-------------+
    | name      | first_value |
    |-----------+-------------|
    | Angela    | Angela      |
    | Angela    | James       |
    | James     | Jane        |
    | Jane      | John        |
    | John      | John        |
    +-----------+-------------+

    NTH_VALUE

    NTH_VALUE(value, nth)

    Returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row.

    WITH examples AS (
        SELECT
            'John' AS name
        UNION ALL
        SELECT
            'Jane'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'Angela'
        UNION ALL
        SELECT
            'James'
    )
    SELECT
        name,
        NTH_VALUE(name, 2) OVER(
            ORDER BY
                name ROWS BETWEEN 1 PRECEDING
                AND 1 FOLLOWING
        )
    FROM
        examples
    +-----------+-------------+
    | name      | first_value |
    |-----------+-------------|
    | Angela    | Angela      |
    | Angela    | Angela      |
    | James     | James       |
    | Jane      | Jane        |
    | John      | John        |
    +-----------+-------------+

    All of the functions listed above depend on the sort ordering specified by the ORDER BY clause of the associated window definition. Rows that are not distinct in the ORDER BY ordering are peers; the four ranking functions are defined so that they give the same answer for any two peers.

    Note that FIRST_VALUE, LAST_VALUE, and NTH_VALUE consider only the rows within the window frame.