String Functions

This page covers functions to manipulate strings in Rockset.

String functions can be used in any part of a query where a string value is in scope, for example in the SELECT, WHERE, and HAVING clauses to name a few.

List of functions defined in this section:

FunctionDescription
ASCII(string)Returns the Unicode code point value of the first character in string, or zero if the string is empty.
CHR(n)Returns a single character string made up of the character whose Unicode code point is n.
CONCAT(x, ...)Returns the concatenation of the strings given in the arguments. Supports variable number of arguments. null arguments are ignored.
DEC128_FORMAT(low, high)Converts from the IEEE 754 decimal128 floating point format to a string using scientific notation. low and high should be integers holding the bottom and top 64 bits of the binary representation, respectively.
FORMAT(format_str, ...)Formats the arguments according to the format string. Will work with integers, floating point numbers, and strings. Follows the format string syntax of fmt: https://fmt.dev/latest/syntax.html
FROM_BASE64(s)Decodes the base64 string s into a bytes value.
FROM_HEX(s)Decodes the hex string s into a bytes value.
FROM_UTF8(b)If b is a bytes value that represents a valid UTF-8 string, return it as a string. Otherwise, raise an error.
JSON_FORMAT(x)Converts given JSON to string.
JSON_PARSE(x)Parses given string as JSON.
LENGTH(obj)Returns number of elements in obj.
LOWER(x)Returns locale-independent lowercase string x.
LPAD(string, targetLength, padString)Pads string from the left to targetLength using padString. If targetLength is less than the length of string, the result will be truncated to the targetLength.
LTRIM(string [, characters])Remove the longest string containing only characters from characters (a space by default) from the start of string
NGRAMS(string, n_length)Returns an array of all ngrams of string of length n_length. The max length allowed for the input string is 100,000.
NGRAMS(string, min_length, max_length)Returns an array of all ngrams of string that are between min_length and max_length in size. The max length allowed for the input string is 100,000.
NORMALIZE(string[, form])Returns Unicode-normalized form of string. form is an identifier and must be one of NFC, NFD, NFKC, NFKD, which are the four Unicode normalization methods; NFC is default.
PG_ARRAY_PARSE(x, typestr)Parses a string encoded using Postgres array output syntax. typestr should be a string holding the Postgres array type, such as 'integer[]' or 'text[]'. Int, float, and string arrays of any depth can be decoded.
PREFIXES(string, max_length)Returns an array of all prefixes ofΒ stringΒ whose length is less than or equal to max_length. If not specified, the default max_length used is 100. The max length allowed for the input string is 100,000.
REGEXP_EXTRACT(string, pattern[, group])Returns the first match of pattern in string, or null if the pattern does not match. If group is specified and greater than zero, returns the groupth capturing group; if group is not specified or is zero, returns the full match.
REGEXP_EXTRACT_ALL(string, pattern[, group])Returns an array containing all matches of pattern in string. If group is specified and greater than zero, extracts the groupth capturing group from each match; if group is not specified or is zero, returns the full matches.
REGEXP_LIKE(string, pattern)Returns true if string matches the regular expression pattern, false otherwise.
REGEXP_REPLACE(string, pattern[, replacement])Returns string with all places where pattern matches replaced with replacement (or erased if replacement is not specified). replacement may use \1 .. \9 escape sequences to refer to capturing groups, or \0 to refer to the entire match.
REGEXP_SPLIT(string, pattern)Returns an array with the components of string when split by pattern.
REPLACE(string, search[, replacement])Returns a string with all instances of search replaced with replacement in string. replacement is optional, which if not specified removes all instance of search from string. When search is an empty string, string is not modified in any way.
REVERSE(string)Returns string with its characters in reverse order.
RPAD(string, targetLength, padString)Pads string from the right to targetLength using padString. If targetLength is less than the length of string, the result will be truncated to the targetLength.
RTRIM(string[, characters])Remove the longest string containing only characters from characters (a space by default) from the end of string
SPLIT(string, delimiter[, limit])Splits string on delimiter and returns an array. With limit, only the first limit - 1 delimiters are split upon, thereby returning an array of size at most limit. The last element in the array always contains everything left in the string in the case where there are >= limit occurrences of the delimiter in the string. limit must be a positive number.
SPLIT_PART(string, delimiter, index)Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.
STRPOS(string, substring)Returns the starting position of the first instance of substring in string. Positions start with 1. If not found, 0 is returned.
SUBSTR(string, start[, length])Returns substring of string starting at character at index given by start (1-based index) and of length length. If length is not given, returns the substring starting at start until the end of string. If start is negative, it rolls over to the end of the string and counts backwards from there.
SUFFIXES(string, truncate_length)Returns an array of all suffixes of string where each returned suffix is truncated to truncate_length. If not specified, the default truncate_length used is 100. The max length allowed for the input string is 100,000.
TOKENIZE(text[, locale])Tokenize text into array of strings, normalized and lower-cased. text is interpreted as text in the language specified by locale. locale is specified using the ICU format. The default is en_US_POSIX. This function is usually used in an ingest transformation to convert a text field into array of strings at the time of document ingestion. Once an input text field is converted to array of strings, you can use SEARCH function in a SQL query to perform text search using this array of strings.
TO_BASE64(b)Encodes the bytes value b into a base64 string representation.
TO_HEX(b)Encodes the bytes value b into a hex string representation.
TO_UTF8(s)Return the bytes UTF-8 representation of the string value s.
LTRIM(string [, characters])Remove the longest string containing only characters from characters (a space by default) from the start of string
UPPER(x)Returns locale-independent uppercase string x.
URL_ENCODE(value)Encodes the string into a percent-encoded ASCII text representation
URL_ENCODE(value)Encodes the string into a percent-encoded ASCII text representation