Async Queries

Queries on Rockset typically must complete within 2 minutes. Use async mode to run longer queries of up to 30 minutes.

The workflow for Async Queries:

  • Send a query request with async enabled.
  • Receive back a query ID.
  • Poll to check the status of the query, which can have the values of QUEUED, RUNNING, COMPLETED, ERROR, or CANCELLED.
  • Fetch results when the query completes.

🚧

Rockset does not recommend using async queries for latency sensitive workloads due to the overhead of additional network requests.

Sending an Async Query Request

To send the request asynchronously, set the async parameter to true in either a Query Lambda or query request.

Example request:

curl --request POST \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries \
    -H 'Authorization: ApiKey $API_KEY' \
  -H 'Content-Type: application/json' \
    -d '{
    "sql": {
      "query": "SELECT * FROM foo;",
    },
    "async": true
  }'

This query request will immediately return with a query id that can be used to poll and retrieve results.

Example response:

{
  "query_id": "db3044b9-ea4e-43f2-8cd5-138092ab9b96:lii9i5B:0",
  "status": "QUEUED",
  ...
}

Polling for Query Status

After submitting an async query request, periodically retrieve the query status to find out if the query has completed.

Example request:

curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries/{query_id}
    -H 'Authorization: ApiKey $API_KEY' \

The response contains a status field, which has the possible values of QUEUED, RUNNING, COMPLETED, ERROR, and CANCELLED.

If the status of the query is ERROR, the error will be available in the query_errors field.

Example response:

{
    "data": {
        "query_id": "5139dcbc-5abc-4c8c-ad69-6feff96f17ef:BBQg6lF:0",
        "status": "ERROR",
        ...
        "query_errors": [
            {
                "type": "QUERY_TIMEOUT",
                "message": "Query timeout reached. The resources allocated for your Virtual Instance are not sufficient
  to run this query. Please upgrade to a larger Virtual Instance or contact Rockset customer support
  for assistance constructing a more efficient query.",
                "status_code": 408
            }
        ]
    }
}

When the status of the query is COMPLETED, you may retrieve results.

Retrieving Query Results

Example request for 1000 documents:

curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries/$QUERY_ID/pages?offset=290000&docs=10000
    -H 'Authorization: ApiKey $API_KEY' \

The docs parameter is optional. If you choose not to add a docs parameter, the default will be 10,000 documents. The maximum value for docs is 100,000.

There is also an offset query parameter, which specifies the offset from the cursor of the first document to be returned. The maximum value for offset is 1,000,000,000. offset will default to 0 if not specified.

Example response:

{
  "results": [
    {
      β€œField1”: ”value1”
    },
    ...
  ],
  "results_total_doc_count": 10000000
  "pagination": {
    "current_page_doc_count": 500,
    "next_cursor_offset": 1500,    // This number is the number of documents before the current page.
    "next_cursor": fds23jurzjsa31  // This value will be null if there are no more results.
  }
}

If there is more than one page of results, use the next_cursor field returned in the response to request the next page of results. Alternatively, you can use the offset parameter to go back and forth between results pages.

Example request:

curl --request GET \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries/$QUERY_ID/pages?cursor=fds23jurzjsa31&docs=10000
    -H 'Authorization: ApiKey $API_KEY' \

Advanced Usage

Setting a Client Timeout

To avoid the additional network requests for short queries, you can optionally set async_options.client_timeout_ms. If the query completes before the client timeout, the results will be returned in-band without needing to poll and retrieve results later. For example, if you want queries under 60 seconds to return results in-band, you can make this request:

curl --request POST \
    --url https://$ROCKSET_SERVER/v1/orgs/self/queries \
    -H 'Authorization: ApiKey $API_KEY' \
  -H 'Content-Type: application/json' \
    -d '{
    "sql": {
      "query": "SELECT * FROM foo;",
    },
    "async": true,
    "async_options": {
       "client_timeout_ms": 60000 // Queries under 60 seconds will return results in-band
    }
  }'

Response for a query that completes before the client timeout (same as a non-async query):

{
  "stats": {
        "elapsed_time_s": 1218,
        "throttled_time_micros": 0
  },
  "results": [
    {
      β€œField1”: ”value1”
    },
    ...
  ],
  "status": "COMPLETED"
}

Since the additional overhead to store query results is avoided when a query completes before the client timeout, the query status and query results will not be available in the GET .../queries/{queryId} and GET .../queries/{queryId}/pages APIs described below. Instead, the status and results will be returned in-band, as it is for non-async queries.

Make sure to handle both the short and long-running query cases when setting the client timeout since you will not know when the query will complete. A query that completes before the client timeout will have a status of COMPLETED in the initial response, while a query that continues to run after the client timeout will have a status of QUEUED or RUNNING.

🚧

Client Timeout Note

Queries with a client timeout set will return a default of 10,000 results in the initial response, with the remaining results to be retrieved through the pagination API.