String Functions

This page covers functions to manipulate strings in Rockset.

String functions can be used in any part of a query where a string value is in scope, for example in the SELECT, WHERE, and HAVING clauses to name a few.

List of functions defined in this section:

Function	Description
`ASCII(string)`	Returns the Unicode code point value of the first character in `string`, or zero if the string is empty.
`CHR(n)`	Returns a single character string made up of the character whose Unicode code point is `n`.
`CONCAT(x, ...)`	Returns the concatenation of the strings given in the arguments. Supports variable number of arguments. `null` arguments are ignored.
`DEC128_FORMAT(low, high)`	Converts from the IEEE 754 decimal128 floating point format to a string using scientific notation. `low` and `high` should be integers holding the bottom and top 64 bits of the binary representation, respectively.
`FORMAT(format_str, ...)`	Formats the arguments according to the format string. Will work with integers, floating point numbers, and strings. Follows the format string syntax of `fmt`: https://fmt.dev/latest/syntax.html
`FROM_BASE64(s)`	Decodes the base64 string `s` into a `bytes` value.
`FROM_HEX(s)`	Decodes the hex string `s` into a `bytes` value.
`FROM_UTF8(b)`	If `b` is a `bytes` value that represents a valid UTF-8 string, return it as a `string`. Otherwise, raise an error.
`JSON_FORMAT(x)`	Converts given JSON to string.
`JSON_PARSE(x)`	Parses given string as JSON.
`LENGTH(obj)`	Returns number of elements in `obj`.
`LOWER(x)`	Returns locale-independent lowercase string `x`.
`LPAD(string, targetLength, padString)`	Pads `string` from the left to `targetLength` using `padString`. If `targetLength` is less than the length of `string`, the result will be truncated to the `targetLength`.
`LTRIM(string [, characters])`	Remove the longest string containing only characters from `characters` (a space by default) from the start of `string`
`NGRAMS(string, n_length)`	Returns an array of all ngrams of `string` of length `n_length`. The max length allowed for the input `string` is 100,000.
`NGRAMS(string, min_length, max_length)`	Returns an array of all ngrams of `string` that are between `min_length` and `max_length` in size. The max length allowed for the input `string` is 100,000.
`NORMALIZE(string[, form])`	Returns Unicode-normalized form of `string`. `form` is an identifier and must be one of `NFC`, `NFD`, `NFKC`, `NFKD`, which are the four Unicode normalization methods; `NFC` is default.
`PG_ARRAY_PARSE(x, typestr)`	Parses a string encoded using Postgres array output syntax. `typestr` should be a string holding the Postgres array type, such as `'integer[]'` or `'text[]'`. Int, float, and string arrays of any depth can be decoded.
`PREFIXES(string, max_length)`	Returns an array of all prefixes of `string` whose length is less than or equal to `max_length`. If not specified, the default `max_length` used is 100. The max length allowed for the input `string` is 100,000.
`REGEXP_EXTRACT(string, pattern[, group])`	Returns the first match of `pattern` in `string`, or `null` if the pattern does not match. If `group` is specified and greater than zero, returns the `group`th capturing group; if `group` is not specified or is zero, returns the full match.
`REGEXP_EXTRACT_ALL(string, pattern[, group])`	Returns an array containing all matches of `pattern` in `string`. If `group` is specified and greater than zero, extracts the `group`th capturing group from each match; if `group` is not specified or is zero, returns the full matches.
`REGEXP_LIKE(string, pattern)`	Returns `true` if `string` matches the regular expression `pattern`, `false` otherwise.
`REGEXP_REPLACE(string, pattern[, replacement])`	Returns `string` with all places where `pattern` matches replaced with `replacement` (or erased if `replacement` is not specified). `replacement` may use `\1` .. `\9` escape sequences to refer to capturing groups, or `\0` to refer to the entire match.
`REGEXP_SPLIT(string, pattern)`	Returns an array with the components of `string` when split by `pattern`.
`REPLACE(string, search[, replacement])`	Returns a string with all instances of `search` replaced with `replacement` in `string`. `replacement` is optional, which if not specified removes all instance of `search` from `string`. When `search` is an empty string, `string` is not modified in any way.
`REVERSE(string)`	Returns `string` with its characters in reverse order.
`RPAD(string, targetLength, padString)`	Pads `string` from the right to `targetLength` using `padString`. If `targetLength` is less than the length of `string`, the result will be truncated to the `targetLength`.
`RTRIM(string[, characters])`	Remove the longest string containing only characters from `characters` (a space by default) from the end of `string`
`SPLIT(string, delimiter[, limit])`	Splits `string` on `delimiter` and returns an array. With `limit`, only the first `limit - 1` delimiters are split upon, thereby returning an array of size at most `limit`. The last element in the array always contains everything left in the string in the case where there are >= `limit` occurrences of the delimiter in the string. `limit` must be a positive number.
`SPLIT_PART(string, delimiter, index)`	Splits `string` on `delimiter` and returns the field `index`. Field indexes start with `1`. If the index is larger than than the number of fields, then `null` is returned.
`STRPOS(string, substring)`	Returns the starting position of the first instance of `substring` in `string`. Positions start with 1. If not found, 0 is returned.
`SUBSTR(string, start[, length])`	Returns substring of `string` starting at character at index given by `start` (1-based index) and of length `length`. If `length` is not given, returns the substring starting at `start` until the end of `string`. If `start` is negative, it rolls over to the end of the string and counts backwards from there.
`SUFFIXES(string, truncate_length)`	Returns an array of all suffixes of `string` where each returned suffix is truncated to `truncate_length`. If not specified, the default `truncate_length` used is 100. The max length allowed for the input `string` is 100,000.
`TOKENIZE(text[, locale])`	Tokenize `text` into array of strings, normalized and lower-cased. `text` is interpreted as text in the language specified by `locale`. `locale` is specified using the ICU format. The default is `en_US_POSIX`. This function is usually used in an ingest transformation to convert a text field into array of strings at the time of document ingestion. Once an input text field is converted to array of strings, you can use `SEARCH` function in a SQL query to perform text search using this array of strings.
`TO_BASE64(b)`	Encodes the `bytes` value `b` into a base64 `string` representation.
`TO_HEX(b)`	Encodes the `bytes` value `b` into a hex `string` representation.
`TO_UTF8(s)`	Return the `bytes` UTF-8 representation of the `string` value `s`.
`LTRIM(string [, characters])`	Remove the longest string containing only characters from `characters` (a space by default) from the start of `string`
`UPPER(x)`	Returns locale-independent uppercase string `x`.
`URL_ENCODE(value)`	Encodes the string into a percent-encoded ASCII text representation
`URL_ENCODE(value)`	Encodes the string into a percent-encoded ASCII text representation