- Querying
Special Fields
This page describes fields with unique roles and behaviors in Rockset documents.
Overview
Special fields are prefixed with an underscore and have important effects on the ingestion and querying behavior of documents in your collections. Some are automatically generated by Rockset during data ingestion, while others can be specified from a source document or ingest transformation.
You can execute the following query on any collection to view some of its special fields:
SELECT
_id,
_meta,
_event_time
FROM
mycollection
LIMIT
5;
+---------------------------------------------------------------------------------+----------------+----------------------------------------+
| _meta | _event_time | _id |
|---------------------------------------------------------------------------------+----------------+----------------------------------------|
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '47304-1709397620935'}} | 1535068823524 | '5d99f42c-44ec-2024-a032-ac14d1cbf44d' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '46242-1709397472272'}} | 1535068823426 | 'e5d4c136-8d56-ac39-38f5-2f795ae38007' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '46242-1503238554088'}} | 1535068823326 | '71df712e-04bc-7c9b-0114-97b19741b215' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '47302-1151051235692'}} | 1535068822852 | '517412d2-1a23-1b49-fc3b-441c630ed4ff' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '45189-1709397393868'}} | 1535068822527 | 'd16bf924-f758-21db-c832-4a25323b2938' |
+---------------------------------------------------------------------------------+----------------+----------------------------------------+
Below is an exhaustive list of all special fields in Rockset.
The _id
field
Every document in a Rockset collection is uniquely identified by its _id
field.
- If the document source does not already have an
_id
field, Rockset populates it with an automatically generated uuid. - If the document source has
_id
specified, or an ingest transformation outputs an_id
field, its value is preserved. A newly ingested document will overwrite any existing document with the same_id
value.
For collections with rollups, this field is populated by Rockset and cannot be specified by the user.
The _meta
field
Metadata regarding each document is stored in a _meta
field of object type.
If the source of a document specifies a _meta
field, Rockset will ignore the field. Currently,
_meta
holds information about the source from which the document was inserted into the collection
(such as the bucket name and path in case of S3). If Rockset is unable to parse the source of a
document, it will create a document without any of the source's fields and will have _meta
with a
nested field named bad
.
This field is never populated for collections with rollups.
The _event_time
field
Rockset associates a timestamp with each document in a field named _event_time
, recorded as
microseconds since the Unix epoch. By default, _event_time
is set as the time a document
is inserted into a Rockset collection.
Users can specify their own _event_time
by including the field in their source records,
or defining an ingest transformation with a mapping for _event_time
.
User-specified _event_time
values must be of either int (microseconds since epoch)
or timestamp type, otherwise the ingestion of the document will fail.
There are 2 important reasons a user would want a custom _event_time
definition
- Rockset's optimizer can make several optimizations if there is a predicate or projection on
_event_time
. So, if you have a timestamp field you expect to filter on or select in queries, you will see significant performance improvements by mapping it to_event_time
vs. leaving it as a regular timestamp field in the collection. - Rockset's time-based retention feature uses
_event_time
to determine when a document has fallen outside the retention window and should be removed from a collection. Sometimes using the default document insertion time for retention makes perfect sense, but many use cases may want to trim records according to something else, in which case they need to define their own_event_time
.
If your collection has rollups and your rollup query does not contain an _event_time
mapping, this field is populated with the initial insertion time of the rolled up document. It does
not change as more input documents are aggregated into the rolled up document.
The _op
field
The _op
field enables flexibly ingesting CDC stream control records into a Rockset collection.
Each document ingested into Rockset may have an optional _op
field that will affect its ingestion behavior.
Supported _op
values (case insensitive):
UPSERT
– This is the default even if no_op
exists. The document (minus the_op
field) will be inserted into the Rockset collection. If another document with that_id
exists, the top-level fields will be merged/overwritten with those from the new document taking precedence.DELETE
– A delete will be issued to the collection for_id
. If a document with a matching_id
exists, it will be deleted, if not, it will be a noop. Specifying_op=DELETE
and not having_id
in the incoming document will lead to an ingestion error for that document.
The value of _op
can come directly from a source document, or from an ingest transformation. Any value other than the
supported ones above will lead to an ingestion error for the document.
Not all collections support _op
. Namely, it is not supported for:
- Rollup collections
- Managed sources which have their own semantics for sending deletes (MongoDB and DynamoDB)
Creating a collection with one of these unsupported configurations with a mapping for _op
will lead to an error at collection creation time.
If a record being ingested into a collection with an unsupported configuration contains _op
from the source, the document will error during ingestion.
Unlike other special fields, _op
is purely an ingest-time concept that does not materialize in a Rockset collection and consequently can't be queried.