Rockset
  • Loading Data

Special Fields

This page describes fields with unique roles and behaviors in Rockset documents.

Overview

Special fields are prefixed with an underscore and have important effects on the ingestion and querying behavior of documents in your collections. Some are automatically generated by Rockset during data ingestion, while others can be specified from a source document or SQL transformation.

You can execute the following query on any collection to view some of its special fields:

SELECT
    _id,
    _meta,
    _event_time
FROM
    mycollection
LIMIT
    5;
+---------------------------------------------------------------------------------+----------------+----------------------------------------+
| _meta                                                                           | _event_time    | _id                                    |
|---------------------------------------------------------------------------------+----------------+----------------------------------------|
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '47304-1709397620935'}}  | 1535068823524  | '5d99f42c-44ec-2024-a032-ac14d1cbf44d' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '46242-1709397472272'}}  | 1535068823426  | 'e5d4c136-8d56-ac39-38f5-2f795ae38007' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '46242-1503238554088'}}  | 1535068823326  | '71df712e-04bc-7c9b-0114-97b19741b215' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '47302-1151051235692'}}  | 1535068822852  | '517412d2-1a23-1b49-fc3b-441c630ed4ff' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '45189-1709397393868'}}  | 1535068822527  | 'd16bf924-f758-21db-c832-4a25323b2938' |
+---------------------------------------------------------------------------------+----------------+----------------------------------------+

Below is an exhaustive list of all special fields in Rockset.

_id

Every document in a Rockset collection is uniquely identified by its _id field.

  • If the document source does not already have an _id field, Rockset populates it with an automatically generated uuid.
  • If the document source has _id specified, or a SQL transformation outputs an _id field, its value is preserved. A newly ingested document will overwrite any existing document with the same _id value.

_meta

Metadata regarding each document is stored in a _meta field of object type.

If the source of a document specifies a _meta field, Rockset will ignore the field. Currently, _meta holds information about the source from which the document was inserted into the collection (such as the bucket name and path in case of S3). If Rockset is unable to parse the source of a document, it will create a document without any of the source's fields and will have _meta with a nested field named bad.

This field is never populated for rollups.

_event_time

Rockset associates a timestamp with each document in a field named _event_time, recorded as microseconds since the Unix epoch. By default, _event_time is set as the time a document is inserted into a Rockset collection.

Users can specify their own _event_time by including the field in their source records, or defining a SQL transformation with a mapping for _event_time. User-specified _event_time values must be of either int (microseconds since epoch) or timestamp type, otherwise the ingestion of the document will fail.

There are 2 important reasons a user would want a custom _event_time definition

  1. Rockset's optimizer can make several optimizations if there is a predicate or projection on _event_time. So, if you have a timestamp field you expect to filter on or select in queries, you will see significant performance improvements by mapping it to _event_time vs. leaving it as a regular timestamp field in the collection.
  2. Rockset's time-based retention feature uses _event_time to determine when a document has fallen outside the retention window and should be removed from a collection. Sometimes using the default document insertion time for retention makes perfect sense, but many use cases may want to trim records according to something else, in which case they need to define their own _event_time.

If your collection has rollups and your rollup query does not contain an _event_time mapping, this field is populated with the initial insertion time of the rolled up document. It does not change as more input documents are aggregated into the rolled up document.

_op

The _op field enables flexibly ingesting CDC stream control records into a Rockset collection. Each document ingested into Rockset may have an optional _op field that will affect its ingestion behavior.

Supported _op values (case insensitive):

  • UPSERT– This is the default even if no _op exists. The document (minus the _op field) will be inserted into the Rockset collection. If another document with that _id exists, the top-level fields will be merged/overwritten with those from the new document taking precedence.
  • DELETE– A delete will be issued to the collection for _id. If a document with a matching _id exists, it will be deleted, if not, it will be a noop. Specifying _op=DELETE and not having _id in the incoming document will lead to an ingestion error for that document.

The value of _op can come directly from a source document, or from a SQL transformation. Any value other than the supported ones above will lead to an ingestion error for the document.

Not all collections support _op. Namely, it is not supported for:

  • Rollup collections
  • Managed sources which have their own semantics for sending deletes (MongoDB and DynamoDB)

Creating a collection with one of these unsupported configurations with a mapping for _op will lead to an error at collection creation time. If a record being ingested into a collection with an unsupported configuration contains _op from the source, the document will error during ingestion.

Unlike other special fields, _op is purely an ingest-time concept that does not materialize in a Rockset collection and consequently can't be queried.