Troubleshooting Ingest

This section walks through a few high-level errors you may encounter with Rockset. This section helps you diagnose and fix problems with document errors.

When you’re ingesting data from a source, you may encounter the following errors:

Parse Errors

Parse Errors occur when data is malformed. When you send a document to Rockset, you sometimes have the option to specify its type:

  • JSON
  • Parquet
  • CSV/TSV
  • XML

You might see a Parse Error if there’s a discrepancy between your file format and the expected file format (e.g. if you specify a CSV file and send a JSON document instead). This error is also returned for an improperly formatted CSV or TSV file.

To fix Parse Errors:

  • Verify that your code or the console is ingesting the correct document.
  • If you are ingesting a CSV or TSV file, make sure the separator, encoding, header, quote character, and escape character are all correct.

Field Malformations During Ingest Transformation

An incorrectly-written ingest transformation results in a INGEST_TRANSFORMATION_ERROR error which is logged in the Ingest Logs. Errors from parsing or validation (known as PARSE_ERROR) will also be listed in Ingest Logs and can also manifest itself in _meta.bad error message in the ingested document document. Common reasons for this error include:

  • Mapping a non-string field to _id.
  • Mapping a non-timestamp field to _event_time.

To fix field malformations for an ingest transformation error:

  • Fix the rollup and ingest transformation query (i.e., map string fields to _id and timestamp fields to
    _event_time).

If you’re unsure of how to update the query, please reach out to the Rockset Community for help.

Document Size is too Large

You can ingest records or documents with sizes up to 40 MiB. Field names are capped at 10 KiB.

If you encounter an error indicating "Size of document exceeds the maximum allowed size of 41943040 bytes", you can drop fields at ingestion time via an Ingest Transformation.

If you encounter an error when using MongoDB as a source indicating "BSONObj size: (value) is invalid. Size must be between 0 and 16793600(16MB)", this is because MongoDB has a BSON document size limit of 16MiB enforced by MongoDB. You can find more information on this on the Ingestion Errors page.

Note that Write API has separate request limits enforced before parsing.

Authentication Errors

Authentication errors occur when a user is not authorized to access a resource. Authentication errors look similar to the following:

AZ_BLOB_STORAGE bucket "jsrocksetblob" does not have list objects permissions set.
Authentication failed for AWS cross-account role integration with Role ARN

To fix authentication errors:

  • Check this docs page for data sources and ensure that the proper permissions have been set.
  • Ensure you select the correct source.
  • Once the permissions have been updated, recreate your collection with the relevant integration.

If you have questions, please reach out to the Rockset Community.

Source Errors

Source Errors occur when Rockset has trouble accessing the data source. Some common source-related errors include:

Deleted Table

The following error can occur when someone deletes a table:

DynamoDB resource not found for table `table_name` in region us-east-1

Deleted Kafka Topic

The following error can occur when someone deletes a Kafka topic:

The Kafka topic is not known to the broker: "nucleus.custom_activities.backup.partner.609040244446599c8b47cb39.membership_counts_o3k"

Expired MongoDB Resume Token

The following error can occur when a MongoDB resume token expires:

Resume token has expired

MongoDB stream events are emitted to the oplog on the server. There is one oplog per MongoDB cluster. Each oplog has a size limit, so the last event is truncated when the limit is exceeded. If Rockset is falling behind while tailing the oplog (due to a cap on ingest rate, based on VI size) then it can lose the position in the oplog and cannot tail the stream further, in which case the Rockset Collection is not updated.

Fix Source Errors

To fix Source Errors:

  • Ensure the tables, collections, and streams are created.
  • If you’re getting a token that has an expired-type error, try increasing the Virtual Instance size to increase the ingest rate.