- Querying
Special Fields
This page describes fields with unique roles and behaviors in Rockset documents.
Overview
Special fields are prefixed with an underscore and have important effects on the ingestion and querying behavior of documents in your collections. Some are automatically generated by Rockset during data ingestion, while others can be specified from a source document or ingest transformation.
Special fields are immutable. Users should only specify their own special fields if they are certain they will not need to be updated.
You can execute the following query on any collection to view some of its special fields:
SELECT
_id,
_meta,
_event_time
FROM
mycollection
LIMIT
5;
+---------------------------------------------------------------------------------+----------------+----------------------------------------+
| _meta | _event_time | _id |
|---------------------------------------------------------------------------------+----------------+----------------------------------------|
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '47304-1709397620935'}} | 1535068823524 | '5d99f42c-44ec-2024-a032-ac14d1cbf44d' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '46242-1709397472272'}} | 1535068823426 | 'e5d4c136-8d56-ac39-38f5-2f795ae38007' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '46242-1503238554088'}} | 1535068823326 | '71df712e-04bc-7c9b-0114-97b19741b215' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '47302-1151051235692'}} | 1535068822852 | '517412d2-1a23-1b49-fc3b-441c630ed4ff' |
| {'s3': {'bucket': 'mycollection', 'offset': 0, 'path': '45189-1709397393868'}} | 1535068822527 | 'd16bf924-f758-21db-c832-4a25323b2938' |
+---------------------------------------------------------------------------------+----------------+----------------------------------------+
Below is an exhaustive list of all special fields in Rockset.
The _id
field
Every document in a Rockset collection is uniquely identified by its _id
field.
- If the document source does not already have an
_id
field, Rockset populates it with an automatically generated uuid. - If the document source has
_id
specified, or an ingest transformation outputs an_id
field, its value is preserved. A newly ingested document will overwrite any existing document with the same_id
value.
For collections with rollups, this field is populated by Rockset and cannot be specified by the user.
The _meta
field
Metadata regarding each document is stored in a _meta
field of object type.
If the source of a document specifies a _meta
field, Rockset will ignore the field. Currently,
_meta
holds information about the source from which the document was inserted into the collection
(such as the bucket name and path in case of S3). If Rockset is unable to parse the source of a
document, it will create a document without any of the source's fields and will have _meta
with a
nested field named bad
.
This field is never populated for collections with rollups.
The _event_time
field
Rockset associates a timestamp with each document in a field named _event_time
, recorded as
microseconds since the Unix epoch. By default, _event_time
is set as the time a document
is inserted into a Rockset collection.
Users can specify their own _event_time
by including the field in their source records,
or defining an ingest transformation with a mapping for _event_time
.
User-specified _event_time
values must be of either int (microseconds since epoch)
or timestamp type, otherwise the ingestion of the document will fail.
Rockset's time-based retention feature uses _event_time
to determine when a document has fallen outside the retention window
and should be removed from a collection. Sometimes using the default document insertion time for retention makes perfect sense,
but many use cases may want to trim records according to something else, in which case they need to define their own _event_time
.
If your collection has rollups and your rollup query does not contain an _event_time
mapping, this field is populated with the initial insertion time of the rolled up document. It does
not change as more input documents are aggregated into the rolled up document.
The _stored
field
_stored
reduces the hot storage size of your collections by excluding data from certain indexes. More specifically,
we exclude _stored
and its children from the inverted and range indexes, but we still include them in our columnar and row indexes.
You can explore the Rockset Storage Architecture to learn more about Rockset's
Converged Indexing.
Leveraging _stored
can significantly reduce storage sizes by lowering the storage amplification associated with indexes.
Though, the reduction depends on data distributions. For certain data distributions, the sizes of the columnar and row indexes
can greatly exceed the sizes of the inverted and range indexes, limiting the relative impact of _stored
. We see this pattern
with large text fields, since the inverted index only tracks prefixes of these fields and not the entire fields. Using _stored
with large text fields will have a limited impact on the overall storage sizes.
You must configure _stored
in your ingest transformations. We recommend using _stored
as an object, so you
can consistently store and reference multiple fields. Though, you can still use _stored
as a scalar or array. The following
examples outline how to use _stored
in your ingest transformations.
SELECT
foo,
{ 'bar': bar } _stored
FROM
_input
SELECT
*
EXCEPT(bar),
{ 'bar': bar } _stored
FROM
_input
Limitations: Since we exclude _stored
and its subfields from the inverted index, queries with predicates on _stored
must use the columnar index during execution. Thus, you should not include fields in _stored
on which you expect to apply selective filters, as the associated queries will run much more efficiently with those fields in the inverted index. You can still efficiently project fields from _stored
after applying selective filters on other fields in your collections.
The _op
field
The _op
field enables flexibly ingesting records into a Rockset collection.
Each document ingested into Rockset may have an optional _op
field that will affect its ingestion behavior.
The value of _op
can come directly from a source document, or from an ingest transformation.
Unlike other special fields, _op
is purely an ingest-time concept that does not materialize in a Rockset collection and consequently can't be queried.
Here are the supported _op
values (case insensitive). If no _op
value is explicitly included in the document, the default operation is UPSERT
.
Any value other than the supported ones below will lead to an ingestion error for the document.
INSERT
– If no document exists with the same_id
, insert this document. If another document with this_id
exists, do nothing._id
is optional for this operation and will be automatically generated if not specified.UPDATE
– If a document exists with the same_id
, overwrite the top-level fields present in this document, leaving all other fields in the existing document unchanged. If no document exists with the same_id
do nothing._id
is required for this operation and ingestion will error if it is not specified. Special field_event_time
cannot be changed and will be ignored if specified.UPSERT
– If a document exists with the same_id
, do anUPDATE
. If it does not exist, do anINSERT
._id
is optional for this operation and will be automatically generated if not specified. This is the default behavior if no_op
value is specified.DELETE
– Delete the document with this_id
if it exists. If no such document exists, do nothing._id
is required for this operation and ingestion will error if it is not specified.REPLACE
– If a document exists with the same_id
, delete the entire existing document and insert this one instead. If no such document exists, do nothing._id
is required for this operation and ingestion will error if it is not specified. UnlikeUPDATE
this will change the_event_time
of the document.REPSERT
– If a document exists with the same_id
do aREPLACE
. If no such document exist, do anINSERT
._id
is required for this operation and ingestion will error if it is not specified.
Not all collections support _op
. Namely, it is not supported for:
- Rollup collections
- Managed sources which have their own semantics for sending deletes (MongoDB and DynamoDB)
Creating a collection with one of these unsupported configurations with a mapping for _op
will lead to an error at collection creation time.
If a record being ingested into a collection with an unsupported configuration contains _op
from the source, the document will error during ingestion.
To illustrate the behavior of _op, here are some sample documents with various _op
types explaining the behavior of each as they are applied sequentially on top of an empty collection.
{"_op": "INSERT", "_id": "abc", "x": 1, "y": "bar"} // _id=abc is inserted with x=1, y=bar
{"_op": "INSERT", "_id": "abc", "x": 2, "y": "baz"} // this is a no-op since _id=abc already exists
{"_op": "UPDATE", "_id": "abc", "x": 2} // now x=2 for _id=abc and y=bar (unchanged)
{"_op": "UPDATE", "_id": "def", "x": 3} // this is a no-op since _id=def does not exist
{"_op": "UPSERT", "_id": "def", "x": 3} // _id=def is inserted with x=3
{"_op": "UPSERT", "_id": "def", "y": "baz"} // now y=baz for _id=def and x=3 (unchanged)
{"_op": "DELETE", "_id": "xyz"} // this is a no-op since _id=xyz does not exist
{"_op": "DELETE", "_id": "def"} // this deletes _id=def leaving only _id=abc in the collection
{"_op": "REPLACE", "_id": "abc", "z": 4} // now _id=abc is exactly {"z": 4} and fields x and y have been removed
{"_op": "REPLACE", "_id": "ghi", "z": 5} // this is a no-op since _id=ghi does not exist
{"_op": "REPSERT", "_id": "ghi", "z": 6} // _id=ghi is inserted with z=6
{"_op": "REPSERT", "_id": "ghi", "y": 7} // now _id=ghi is exactly {"y": 7} and field z has been deleted