- Loading Data
When inserting data into Rockset, you can transform the data by providing a single SQL query, that contains all of the desired data transformations. This is referred to as the collection’s field mapping query.
Every ingested document is passed through the field mapping query specified at collection creation time. The result of the field mapping query for an input document is the final document that will be persisted in the collection. In addition, query-based field mappings enable you to drop input documents based on some specified condition so they don’t become persisted in the collection.
Field mappings are often used to apply data transformations in the following use cases:
You can map an input field to a specific advanced data type. For example, mapping an input string field
CAST(last_updated_at as DATETIME)
If your input dataset has PII (personally identifiable information) or PHI (protected health
information) fields, you can use a one-way crypto hash function such as
SHA256() on the input field, so that Rockset only stores the hashed value and not the original
Text fields in the input can be tokenized at data ingestion time so you can more efficiently run text search queries later.
If your application queries involve complex SQL functions which result in slow query processing, you can use a field mapping to pre-compute the result of the expensive SQL expression at data ingestion. Queries on these computed output fields are much faster than executing the SQL expression at query time.
-- extract the domain from the input email address
Importing Geographic Data
-- create a geography from a longitude and latitude field
If you have unused large fields in your input dataset, you can drop them upon data ingestion to save storage space.
You can use field mappings to specify the values of special fields, such as
See the special fields documentation for more information, and when to use field mappings to specify their values.
Creating Field Mappings
The basic structure of a field mapping query is as follows:
< expressions >
< condition > -- optional
Note: As with a normal SQL query, all expressions in the
SELECT clause are executed independently
of one another, and in parallel. Additionally, you cannot change a collection's field mapping
query once the collection has been created.
The following sections provide more detail about the components of a field mapping query.
SELECT clause is where you define:
- The input fields you want to keep or drop in the final document.
- The new expressions you want to add as fields to your final document.
To keep all input fields from the input document by default, you include
SELECT * in your
field mapping query. To drop certain input fields, include something like:
SELECT * EXCEPT (<excluded fields>) in your field mapping query. For example,
SELECT * EXCEPT (a, b) keeps all input fields from the input document by default, except for
If you don’t want to include any input fields in the final document by default, omit the
* field from your field mapping query's
To add a new field to your final document, add
SELECT <expression> AS <field> to your field mapping query.
SELECT CONCAT(a, b) AS ab will add a new field named
ab to your final document, and its value
will be computed from the expression
Note: New fields added to the final document using the
SELECT <expression> AS <field> syntax
will take naming precedence over input fields from the input document. This means that if
SELECT *, CONCAT(a, b) AS a as part of your field mapping query, the final document will
a with value
CONCAT(a, b), not the value of field
a in the input document.
FROM clause does not change for different field mapping queries. It always looks like
FROM _input, and it represents the stream of input documents.
WHERE clause enables you to drop input documents based on some condition. Including this clause in the
field mapping query is optional, unlike the
FROM clauses which are required.
As with a normal SQL query, the
WHERE clause condition can refer to input fields from the input
document. For example, you can include
WHERE a IS NOT NULL AND b IS NOT NULL in your field mapping
query to drop input documents where the field
null or the field
Let’s look at some example field mapping queries.
-- drop some fields
(extra_data, more_extra_data, email_address),
-- type coercion
CAST(last_updated_at AS DATETIME) AS last_updated_at,
-- PII/PHI masking
TO_HEX(SHA256(email_address)) AS email_address_hash,
-- filter input documents
email_address IS NOT NULL
This field mapping query keeps all input fields by default, except for
email_address. It adds two new fields to the final document:
email_address_hash. If there are any input fields named
email_address_hash in the input document, their values in the final document will be the values of
the specified expressions, not their values in the input document. It also filters out all input
-- search tokenization
TOKENIZE(tweet_text, 'en_US') AS tweet_text_tokens,
-- functional index
) AS email_address_domain,
-- native geography type
ST_GEOGPOINT(longitude, latitude) AS geo,
This field mapping query drops all input fields by default and only keeps the input fields
follower_count. It also adds three new fields to our final document:
user IS NOT NULL,
) AS email_address
This field mapping query keeps all input fields by default and adds one new field to the
email_address field will be set to
undefined in the final document.
Setting a field’s value to
undefined in a field mapping query means that the field will not be present
in the final document. Thus,
email_address will only be present in the final document if the input
user field is not
null. See documentation on undefined for more details.
TRY(total_weight / num_items) IS NOT NULL,
total_weight / num_items,
) AS avg_weight
This field mapping query keeps all input fields by default. It adds one new field to the
avg_weight. As with the example above, the
avg_weight field will only be present in
the final document if the expression
TRY(total_weight/num_items) is not
TRY is a special
operator that evaluates an expression and returns
null if an error is encountered during the
expression evaluation (rather than failing the query entirely). Otherwise,
TRY will return the
expression evaluation result. You can use
TRY in a field mapping query so that it only includes a field in
the final document if its expression can be evaluated successfully. See
the documentation for TRY for more details.