Monitoring And Alerting

Metrics Endpoint

Beyond the Console Metrics page, additional metrics are accessible through the metrics endpoint in Prometheus/OpenMetrics format. This format is compatible with monitoring/alerting tools such as Prometheus, Datadog and AWS Cloudwatch (among many others).

$ curl https://$ROCKSET_SERVER/v1/orgs/self/metrics -u {API key}:

# HELP rockset_collections Number of collections.
# TYPE rockset_collections gauge
rockset_collections{virtual_instance_id="30",workspace_name="commons",} 20.0
rockset_collections{virtual_instance_id="30",workspace_name="myWorkspace",} 2.0
rockset_collections{virtual_instance_id="30",workspace_name="myOtherWorkspace",} 1.0
# HELP rockset_collection_size_bytes Collection size in bytes.
# TYPE rockset_collection_size_bytes gauge
rockset_collection_size_bytes{virtual_instance_id="30",workspace_name="commons",collection_name="_events",} 3.74311622E8
...

You can enable the metrics endpoint for your Virtual Instance from the Metrics tab in the Rockset Console.

You can read more about the three metric types currently used here:

🚧
Some metric types (e.g. Histogram) are represented through a set of sub-items.
For example, the rockset_query_latency_seconds metric (a Histogram) would be represented by several rockset_query_latency_seconds_bucket records along with a rockset_query_latency_seconds_sum.
Most monitoring clients will handle these complex types automatically on your behalf.

The following metrics are provided and updated at one-minute intervals:

Virtual Instance Metrics

Metric	Type	Description
`rockset_leaf_cpu_utilization_percentage`	Gauge	Average CPU utilization across the leaves in a Virtual Instance. Leaf nodes store and ingest data. Leaf CPU utilization reflects both data ingestion and query processing.
`rockset_leaf_memory_utilization_percentage`	Gauge	Average memory utilization across the leaves in a Virtual Instance. Leaf nodes store and ingest data. Leaf memory utilization reflects both data ingestion and query processing.
`rockset_leaf_block_cache_utilization_percentage`	Gauge	Percentage of total memory on the Virtual Instance that the block cache is using. The block cache is where Rockset caches data for reads.
`rockset_leaf_block_cache_allocation_percentage`	Gauge	The block cache can use up to this percentage of total memory of the Virtual Instance. The block cache is where Rockset caches data for reads.
`rockset_leaf_block_cache_hit_percentage`	Gauge	The hit rate measures how often the queried data is found in the block cache. This number is block cache hits / block cache hits and misses.
`rockset_leaf_memtable_utilization_percentage`	Gauge	Percentage of total memory on the Virtual Instance that the memtable is using. The memtable is an in-memory data structure that stores recently updated data before flushing it to the on-disk storage (SST). We call this the ingest buffer or tailing buffer in the console.
`rockset_leaf_memtable_allocation_percentage`	Gauge	The memtable can use up to this percentage of total memory of the Virtual Instance. The memtable is an in-memory data structure that stores recently updated data before flushing it to the on-disk storage (SST). We call this the ingest buffer or tailing buffer in the console.
`rockset_leaf_tailing_stopped_timestamp_seconds`	Gauge	This value will show the timestamp of when tailing stopped on your Virtual Instance. If tailing is active, this value is 0. Tailing stops when you exceed the memory limit of your memtable. Periodically the VI will restart to try to recover to a stable state so you may see tailing resume temporarily. However, if the Virtual Instance continues to have insufficient memory, tailing will stop again.

Virtual Instance metrics are useful for monitoring compute usage and alerting when your VI is near the limits of its performance. Query performance and ingest latency may both degrade as these metrics near 100%.

Collection Metrics

Metric	Type	Description
`rockset_collections`	Gauge	Number of collections.
`rockset_collection_size_bytes`	Gauge	Collection size in bytes. Note that this size reflects the current storage size and will decrease as documents expire via specified retention duration or are deleted.
`rockset_collection_documents`	Gauge	Number of documents currently in each collection.
`rockset_collection_total_ingest_bytes`	Counter	Number of bytes ingested over the history of each collection. Note that this count only ever increases and is therefore well suited for `increase` and `rate` functions to compute ingest over time.
`rockset_collection_parse_errors`	Counter	Number of parse errors for each collection.
`rockset_collection_data_discovery_latency`	Histogram	The duration (in seconds) from when new or updated data appears in a data source until Rockset first detects it. Elevated values for this metric often reflect configuration issues in the underlying data source (e.g. an inadequate number of RCUs provisioned for DynamoDB sources).
`rockset_collection_data_process_latency`	Histogram	The duration (in seconds) from when new or updated data is first detected by Rockset until the data is fully processed and query-able. Elevated values for this metric can be alleviated by allocating additional compute to your Virtual Instance.
`rockset_collection_memtable_utilization_percentage`	Gauge	Percentage of total memory on the Virtual Instance that the memtable is using to tail this collection. The memtable is an in-memory data structure that stores recently updated data before flushing it to the on-disk storage (SST). We call this the ingest buffer or tailing buffer in the console.
`rockset_data_discovery_latency`	Histogram	Data discovery latency accross all collections. Unlike the collection-specific metric, this metric continues to include data from deleted collections.
`rockset_data_process_latency`	Histogram	Data process latency accross all collections. Unlike the collection-specific metric, this metric continues to include data from deleted collections.

Query Metrics

Metric	Type	Description
`rockset_queries`	Counter	Cumulative count of queries run on this Virtual Instance.
`rockset_query_latency_seconds`	Histogram	Query latency, including admission control duration. Note that this metric is exposed as a histogram — you can compute any PXX that you'd like with an accuracy of +/- ~15% in almost all cases.
`rockset_query_admission_latency_seconds`	Histogram	Admission control queue duration per query if admission control is enabled for your account.
`rockset_query_queue_size`	Gauge	Number of queries currently queued (throttled by admission control).
`rockset_query_errors`	Counter	Number of query execution errors, labeled by HTTP error code (e.g. `404`, `500`).
`rockset_query_lambda_queries`	Counter	Number of queries by Query Lambda. Note that the `tag` label is tracked if and only if the execution is specified by tag.
`rockset_query_lambda_latency_seconds`	Histogram	Query latency by Query Lambda. Note that the `tag` label is tracked if and only if the execution is specified by tag.
`rockset_query_lambda_admission_latency_seconds`	Histogram	Query admission latency by Query Lambda. Note that the `tag` label is tracked if and only if the execution is specified by tag.
`rockset_query_lambda_errors`	Counter	Number of query execution errors by Query Lambda. Note that the `tag` label is tracked if and only if the execution is specified by tag.
`rockset_running_queries`	Gauge	Number of queries that is currently running on the Virtual Instance.

Reference Configurations & Templates

📘
You can find reference configurations and templates for Prometheus, Datadog, Grafana and Alertmanager here.

Below is an example of a Prometheus scrape_configs:

  - job_name: Rockset Metrics API
    scrape_interval: 1m
    scrape_timeout: 1m
    honor_timestamps: true
    static_configs:
      - targets:
        - api.usw2a1.rockset.com
    scheme: https
    basic_auth:
      username: <API Key>
      password:
    metrics_path: /v1/orgs/self/metrics

Monitoring And Alerting

Metrics Endpoint

🚧
Some metric types (e.g. Histogram) are represented through a set of sub-items.

Virtual Instance Metrics

Collection Metrics

Query Metrics

Reference Configurations & Templates

📘
You can find reference configurations and templates for Prometheus, Datadog, Grafana and Alertmanager here.

Metrics Endpoint

🚧Some metric types (e.g. Histogram) are represented through a set of sub-items.

Virtual Instance Metrics

Collection Metrics

Query Metrics

Reference Configurations & Templates

📘You can find reference configurations and templates for Prometheus, Datadog, Grafana and Alertmanager here.

🚧
Some metric types (e.g. Histogram) are represented through a set of sub-items.

📘
You can find reference configurations and templates for Prometheus, Datadog, Grafana and Alertmanager here.