Aggregate Functions
This page covers functions that can be used in aggregation queries.
All aggregate functions take arguments which are expressions potentially involving an identifier.
The functions operate on a number of values and return a single result. All single argument
functions support a DISTINCT
option, as in COUNT(DISTINCT x)
, where all the duplicate values are
removed from the input set.
SELECT
COUNT(is_even)
FROM
numbers -- collection with is_even values [true, true, false, false, true]
5
SELECT
COUNT(DISTINCT is_even)
FROM
numbers
2
General Aggregations
ARRAY_AGG
ARRAY_AGG(x)
Returns an array created from all the elements in x
.
This function accepts a set of values and returns an array in which each value in the set is assigned to an element of the array.
Use LIMIT
to enforce the maximum number of elements in the array. Use ORDER BY
to specify
how elements should be ordered in the array. Use OFFSET
to add an offset to the ordered array
result.
WITH example AS (
SELECT
'red' AS color,
'primary' AS type
UNION ALL
SELECT
'blue',
'primary'
UNION ALL
SELECT
'yellow',
'secondary'
UNION ALL
SELECT
'green',
'primary'
UNION ALL
SELECT
'white',
'mix'
UNION ALL
SELECT
'black',
null
)
SELECT
t.type,
ARRAY_AGG(t.color) AS all_colors
FROM
example t
GROUP BY
t.type
+-----------+--------------------------+
| type | all_colors |
|-----------+--------------------------|
| primary | ["red", "blue", "green"] |
| mix | ["white"] |
| secondary | ["yellow"] |
| null | ["black"] |
+-----------+--------------------------+
WITH example AS (
SELECT
'blue' AS color,
'primary' AS type,
undefined AS weight,
UNION ALL
SELECT
'red',
'primary',
5
UNION ALL
SELECT
'yellow',
'secondary',
8
UNION ALL
SELECT
'green',
'primary',
1
UNION ALL
SELECT
'white',
'mix',
9
UNION ALL
SELECT
'purple',
'mix',
10
UNION ALL
SELECT
'black',
undefined,
3
)
SELECT
ARRAY_AGG(
color
ORDER BY
type ASC NULLS FIRST,
weight DESC
LIMIT
5 OFFSET 1
) AS all_colors
FROM
example
+---------------------------------------------+
| all_colors |
|---------------------------------------------|
| ["purple", "white", "blue", "red", "green"] |
+---------------------------------------------+
MAP_AGG
MAP_AGG(keys, values)
Creates an object where the keys are from the first input and the values are from the second input.
WITH fruits AS (
SELECT
'apple' as key,
'red' as value
UNION ALL
SELECT
'banana' AS key,
'yellow' as value
)
SELECT
MAP_AGG(key, value)
FROM
fruits
{"apple":"red","banana":"yellow"}
MAP_AGG can also be used as a Counter function in the example below.
WITH fruits AS (
SELECT
'apple' AS key,
UNION ALL
SELECT
'banana'
UNION ALL
SELECT
'apple'
)
SELECT
MAP_AGG(x.key, x.counter)
FROM
(
SELECT
key,
COUNT(key) AS counter
FROM
fruits
GROUP BY
1
) x
{"apple":2,"banana":1}
ARBITRARY
ARBITRARY(x)
Returns an arbitrary non-null element from the input. Returns null if the input is empty or all null.
This function returns an arbitrary element. It will return some non-null element from the input. It only returns null if there are no inputs or if all inputs are null.
WITH example AS (
SELECT
['red'] AS color
UNION ALL
SELECT
['blue']
UNION ALL
SELECT
['yellow']
UNION ALL
SELECT
['green']
UNION ALL
SELECT
['white']
UNION ALL
SELECT
['black']
)
SELECT
arbitrary(t.color) AS arbitrary_color
FROM
example t
["red"]
COUNT
COUNT(*)
Returns the number of input rows.
WITH examples AS (
SELECT
'John' AS name
UNION ALL
SELECT
'Jane'
UNION ALL
SELECT
'Angela'
UNION ALL
SELECT
null
UNION ALL
SELECT
'James'
)
SELECT
COUNT(name)
FROM
examples
4
WITH examples AS (
SELECT
'John' AS name
UNION ALL
SELECT
'Jane'
UNION ALL
SELECT
'Angela'
UNION ALL
SELECT
null
UNION ALL
SELECT
'James'
)
SELECT
COUNT(*)
FROM
examples
5
WITH examples AS (
SELECT
'John' AS name
UNION ALL
SELECT
'Jane'
UNION ALL
SELECT
'Angela'
UNION ALL
SELECT
'Aaron'
UNION ALL
SELECT
'james'
)
SELECT
UPPER(SUBSTR(name, 1, 1)) first_letter,
COUNT(*) count
FROM
examples
GROUP BY
first_letter
+--------------+-----------+
| first_letter | count |
|--------------+-----------|
| J | 3 |
| A | 2 |
+--------------+-----------+
APPROX_DISTINCT
APPROX_DISTINCT(x[, e])
Returns the approximate number of distinct elements with a non-null value for field x
.
This function calcuates an approximate count of the number of distinct values. It is similar to
COUNT(DISTINCT x)
, but it consumes a bounded amount of memory while DISTINCT
consumes memory
linear in the number of distinct items being counted.
Note: The result it yields is not guaranteed to be exact. If the parameter
e
is specified, the standard error of the estimate will be no more thane
. If it is not specified,e
will default to 1%.e
must be in the range[0.0011, 0.26]
. APPROX_DISTINCT is implemented with the improved estimator for HyperLogLog proposed by Otmar Ertl.
WITH examples AS (
SELECT
'John' AS name
UNION ALL
SELECT
'John'
UNION ALL
SELECT
'Angela'
UNION ALL
SELECT
null
UNION ALL
SELECT
'Angela'
)
SELECT
APPROX_DISTINCT(name)
FROM
examples
2
GROUPING
GROUPING(col1, col1, ... col_n)
Returns the grouping mask, which is a bitmask associating one bit with every column (the first column in the list of arguments corresponds to the most significant bit in the result).
A column's bit is 0 if the column is included in the current grouping and 1 if it is not included.
Most useful with GROUPING SETS
, ROLLUP
, or
CUBE
.
/*
Using GROUPING_SETS((), (x), (y), (x,y)) leads to 4 aggregations,
one by x, one by y, one by the pair (x, y) and one by () which means
all rows. The groupId will identify which of these aggregations the
sum belongs to. The groupIds are:
3– in binary '11' -> x=1, y=1 -> grouped by ()
2- in binary '10' -> x=1, y=0 -> grouped by y
1- in binary '01' -> x=0, y=1 -> grouped by x
0- in binary '00' -> x=0, y=0 -> grouped by (x, y)
*/
WITH examples AS (
SELECT
1 AS x,
'a' AS y
UNION ALL
SELECT
2,
'b'
UNION ALL
SELECT
1,
'c'
UNION ALL
SELECT
5,
'a'
)
SELECT
GROUPING(x, y) group_id,
SUM(x) sum
FROM
examples
GROUP BY
GROUPING SETS((), (x), (y), (x, y))
ORDER BY
group_id DESC
+-----------+-------+
| group_id | sum |
|-----------+-------|
| 3 | 9 |
| 2 | 1 |
| 2 | 6 |
| 2 | 2 |
| 1 | 2 |
| 1 | 2 |
| 1 | 5 |
| 0 | 2 |
| 0 | 1 |
| 0 | 5 |
| 0 | 1 |
+-----------+-------+
/*
This is equivalent to the example above but
uses the command CUBE instead of GROUPING SETS
*/
WITH examples AS (
SELECT
1 AS x,
'a' AS y
UNION ALL
SELECT
2,
'b'
UNION ALL
SELECT
1,
'c'
UNION ALL
SELECT
5,
'a'
)
SELECT
GROUPING(x, y) group_id,
SUM(x) sum
FROM
examples
GROUP BY
CUBE(x, y)
ORDER BY
group_id DESC
+-----------+-------+
| group_id | sum |
|-----------+-------|
| 3 | 9 |
| 2 | 1 |
| 2 | 6 |
| 2 | 2 |
| 1 | 2 |
| 1 | 2 |
| 1 | 5 |
| 0 | 2 |
| 0 | 1 |
| 0 | 5 |
| 0 | 1 |
+-----------+-------+
/*
Almost identical to the previous two examples, but
there is no group (y) since ROLLUP only groups by
subsequences of the provided columns, so in this case
(x, y), (x), and ().
*/
WITH examples AS (
SELECT
1 AS x,
'a' AS y
UNION ALL
SELECT
2,
'b'
UNION ALL
SELECT
1,
'c'
UNION ALL
SELECT
5,
'a'
)
SELECT
GROUPING(x, y) group_id,
SUM(x) sum
FROM
examples
GROUP BY
ROLLUP(x, y)
ORDER BY
group_id DESC
+-----------+-------+
| group_id | sum |
|-----------+-------|
| 3 | 9 |
| 1 | 2 |
| 1 | 2 |
| 1 | 5 |
| 0 | 2 |
| 0 | 1 |
| 0 | 5 |
| 0 | 1 |
+-----------+-------+
Logical Aggregations
BITWISE_AND_AGG
BITWISE_AND_AGG(x)
Returns the bitwise AND of all input values in 2's complement representation.
WITH examples AS (
SELECT
1 AS x -- 00001
UNION ALL
SELECT
2 -- 00010
UNION ALL
SELECT
4 -- 00100
UNION ALL
SELECT
8 -- 01000
UNION ALL
SELECT
16 -- 10000
)
SELECT
BITWISE_AND_AGG(x)
FROM
examples
0
BITWISE_OR_AGG
BITWISE_OR_AGG(x)
Returns the bitwise OR of all input values in 2's complement representation.
WITH examples AS (
SELECT
1 AS x -- 00001
UNION ALL
SELECT
2 -- 00010
UNION ALL
SELECT
4 -- 00100
UNION ALL
SELECT
8 -- 01000
UNION ALL
SELECT
16 -- 10000
)
SELECT
BITWISE_OR_AGG(x)
FROM
examples
31
BOOL_AND
BOOL_AND(x)
Returns true
if every value in the input is true
, false
otherwise. Return value and all arguments are boolean
.
WITH examples AS (
SELECT
true AS x
UNION ALL
SELECT
true
UNION ALL
SELECT
true
)
SELECT
BOOL_AND(x)
FROM
examples
true
WITH examples AS (
SELECT
true AS x
UNION ALL
SELECT
true
UNION ALL
SELECT
false
)
SELECT
BOOL_AND(x)
FROM
examples
false
BOOL_OR
BOOL_OR(x)
Returns true
if one value in the input is true
, false
otherwise. Return value and all arguments are boolean
.
WITH examples AS (
SELECT
false AS x
UNION ALL
SELECT
false
UNION ALL
SELECT
false
)
SELECT
BOOL_OR(x)
FROM
examples
false
WITH examples AS (
SELECT
false AS x
UNION ALL
SELECT
false
UNION ALL
SELECT
true
)
SELECT
BOOL_OR(x)
FROM
examples
true
COUNT_IF
COUNT_IF(x)
Returns the number of elements in x
which are true
.
WITH examples AS (
SELECT
false AS x
UNION ALL
SELECT
false
UNION ALL
SELECT
true
)
SELECT
COUNT_IF(x)
FROM
examples
1
EVERY
EVERY(x)
An alias of BOOL_AND
function.
Mathematical Aggregations
AVG
AVG(x)
Returns an average of all the elements in x
. All elements are implicitly promoted to float
. Return value is of type float
.
WITH examples AS (
SELECT
3 AS x
UNION ALL
SELECT
5
UNION ALL
SELECT
10
)
SELECT
AVG(x)
FROM
examples
6.0
WITH examples AS (
SELECT
3 AS x,
true AS y
UNION ALL
SELECT
5,
false
UNION ALL
SELECT
10,
true
UNION ALL
SELECT
1,
false
)
SELECT
y,
AVG(x) avg
FROM
examples
GROUP BY
y
+-------+------+
| y | avg |
|-------+------|
| True | 6.5 |
| False | 3.0 |
+-------+------+
MAX
MAX(x)
Returns the maximum value of all elements in x
.
WITH examples AS (
SELECT
3 AS x
UNION ALL
SELECT
null
UNION ALL
SELECT
10
)
SELECT
MAX(x)
FROM
examples
10
WITH examples AS (
SELECT
3 AS x,
true AS y
UNION ALL
SELECT
5,
false
UNION ALL
SELECT
10,
true
UNION ALL
SELECT
1,
false
)
SELECT
y,
MAX(x) max
FROM
examples
GROUP BY
y
+-------+------+
| y | max |
|-------+------|
| True | 10 |
| False | 5 |
+-------+------+
MAX_BY
MAX_BY(x, y)
Returns the value of column x
associated with the maximum value of column y
.
WITH examples AS (
SELECT
3 AS x,
5 AS y
UNION ALL
SELECT
1,
10
UNION ALL
SELECT
10,
2
)
SELECT
MAX_BY(x, y)
FROM
examples
1
MIN
MIN(x)
Returns the minimum value of all elements in x
.
WITH examples AS (
SELECT
3 AS x
UNION ALL
SELECT
null
UNION ALL
SELECT
10
)
SELECT
MIN(x)
FROM
examples
3
WITH examples AS (
SELECT
3 AS x,
true AS y
UNION ALL
SELECT
5,
false
UNION ALL
SELECT
10,
true
UNION ALL
SELECT
1,
false
)
SELECT
y,
MIN(x) min
FROM
examples
GROUP BY
y
+-------+------+
| y | min |
|-------+------|
| True | 3 |
| False | 1 |
+-------+------+
MIN_BY
MIN_BY(x, y)
Returns the value of column x
associated with the minimum value of column y
.
WITH examples AS (
SELECT
3 AS x,
5 AS y
UNION ALL
SELECT
1,
10
UNION ALL
SELECT
10,
2
)
SELECT
MIN_BY(x, y)
FROM
examples
10
STDDEV_SAMP
STDDEV_SAMP(x)
Returns the sample standard deviation of all input values.
WITH examples AS (
SELECT
3 AS x
UNION ALL
SELECT
null -- ignored in STDEV calculation
UNION ALL
SELECT
10
)
SELECT
STDDEV_SAMP(x)
FROM
examples
4.949747468305833
SUM
SUM(x)
Returns the sum of all elements in x
. Returns a value of type int
if all of the input elements are int
, float
otherwise.
WITH examples AS (
SELECT
3 AS x
UNION ALL
SELECT
null -- ignored in SUM calculation
UNION ALL
SELECT
10
)
SELECT
SUM(x)
FROM
examples
13
WITH examples AS (
SELECT
3 AS x,
true AS y
UNION ALL
SELECT
5,
false
UNION ALL
SELECT
10,
true
UNION ALL
SELECT
1,
false
)
SELECT
y,
SUM(x) sum
FROM
examples
GROUP BY
y
+-------+------+
| y | sum |
|-------+------|
| True | 13 |
| False | 6 |
+-------+------+
Histogram Map Aggregations
Note: Histogram map aggregations can only be used in rollups.
A histogram map aggregate consists of two components:
- An object which maps from keys to values
- An object which maps from value hash to the number of times the value appears in the key-value map
Therefore, a histogram map aggregate is a regular key-value map supplemented with a value histogram. This means that in addition to key lookups, histogram map aggregates provide the ability to efficiently look up the number of times a given value appears in the key-value map.
HMAP_AGG
HMAP_AGG(key, value, op)
Creates a histogram map aggregate. For each input document, the op
argument specifies the update semantics for the given key
value
pair.
The op
argument accepts the following string values (case-insensitive):
'UPSERT'
: Adds the key value pair to the object if the key does not already exist in the object. Overwrites the value for the key otherwise.'INSERT'
: Adds the key value pair to the object if the key does not already exist in the object. Does not overwrite the value if the key already exists in the object.'UPDATE'
: Updates the value for the key if the key already exists in the object. Does not add the key value pair if the key does not already exist in the object.'DELETE'
: Deletes the key from the object. If the key does not already exist in the object, this is a no-op.'IGNORE'
: Ignore this update completely.
The op
argument is optional. If it is omitted or set to null or undefined, 'UPSERT'
semantics are used.
Suppose we have the following input documents for our rollup collection fruits
:
{"fruit": "apple", "color": "green", "op": "INSERT"},
{"fruit": "banana", "color": "yellow", "op": "UPSERT"},
{"fruit": "grape", "color": "purple"},
{"fruit": "orange", "color": "orange", "op": "DELETE"},
{"fruit": "apple", "color": "pink"},
{"fruit": "apple", "color": "red", "op": "INSERT"},
{"fruit": "pear", "color": "green", "op": "UPDATE"},
{"fruit": "grape", "color": "green", "op": "DELETE"},
{"fruit": "banana", "color": "brown", "op": "UPDATE"},
{"fruit": "pear", "color": "brown", "op": "INSERT"}
Given the rollup query:
SELECT
HMAP_AGG(fruit, color, op) hmap
FROM
_input
The key-value map component of the created histogram map aggregate will be as follows:
{
{"fruit": "apple", "color": "pink"},
{"fruit": "banana", "color": "brown"},
{"fruit": "pear", "color": "brown"}
}
The value histogram component of the created histogram map aggregate can be queried to see that the value 'pink'
occurs once and the value 'brown'
occurs twice in the key-value map.
HMAP_ELEMENT_AT
HMAP_ELEMENT_AT(hmap, k)
Returns the value associated with key k
in the histogram map aggregate hmap
.
Using the same rollup example as above,
SELECT
HMAP_ELEMENT_AT(hmap, 'apple')
FROM
fruits
'pink'
SELECT
HMAP_ELEMENT_AT(hmap, 'strawberry')
FROM
fruits
{ "__rockset_type": "undefined" }
HMAP_VALUE_COUNT
HMAP_VALUE_COUNT(hmap, v)
Returns the number of times value v
occurs in the histogram map aggregate hmap
.
Using the same rollup example as above,
SELECT
HMAP_VALUE_COUNT(hmap, 'brown')
FROM
fruits
2
SELECT
HMAP_VALUE_COUNT(hmap, 'blue')
FROM
fruits
0
HMAP_CONTAINS_VALUE
HMAP_CONTAINS_VALUE(hmap, v)
Returns whether value v
occurs in the histogram map aggregate hmap
.
Using the same rollup example as above,
SELECT
HMAP_CONTAINS_VALUE(hmap, 'pink')
FROM
fruits
true
SELECT
HMAP_CONTAINS_VALUE(hmap, 'orange')
FROM
fruits
false