Handling Slow Queries in MongoDB - Part 1: Investigation

August 7, 2020

Justin Liu

Product Manager, Rockset

One of the most essential factors of performance in any application is latency. Faster application response times have been proven to increase user interaction and engagement as systems appear more natural and fluid with lower latencies. As data size, query complexity, and application load increase, continuing to deliver the low data and query latencies required by your application can become a serious pain point.

In this blog, we’ll explore a few key ways to understand and address slow queries in MongoDB. We’ll also take a look at some strategies on how to mitigate issues like these from arising in the future.

Identifying Slow Queries using the Database Profiler

The MongoDB Database Profiler is a built-in profiler which collects detailed information (including all CRUD operations and configuration changes) about what operations the database took while executing each your queries and why it chose them. It then stores all of this information inside a capped system collection in the admin database which you can query at anytime.

Configuring the Database Profiler

By default, the profiler is turned off, which means you need to start by turning it on. To check your profiler’s status, you can run the following command:

db.getProfilingStatus()

This will return one of three possible statuses:

Level 0 - The profiler is off and does not collect any data. This is the default profiler level.
Level 1 - The profiler collects data for operations that take longer than the value of slowms.
Level 2 - The profiler collects data for all operations.

You can then use this command to set the profiler to your desired level (in this example, it is set to Level 2):

db.setProfilingLevel(2)

Keep in mind that the profiler does have a (potentially significant) impact on the performance of your database since it has a lot more work to do now with each operation, especially if set to Level 2. Furthermore, the system collection storing your profiler’s findings is capped, meaning that once the size capacity is reached, documents will begin to be deleted steadily beginning with the oldest timestamps. You may want to carefully understand and evaluate the possible implications on your performance before turning this feature on in production.

Analyzing Performance Using the Database Profiler

Now that the profiler is actively collecting data on our database operations, let’s explore a few useful commands we can run on our profiler’s system collection storing all this data to see if we can find which queries are causing high latencies.

I usually like to start by simply finding my top queries taking the longest execution time by running the following command:

db.system.profile
    .find({ op: { $eq: "command" }})
    .sort({ millis: -1 })
    .limit(10)
    .pretty();

We can also use the following command to list all the operations taking longer than a certain amount of time (in this case, 30ms) to execute:

db.system.profile
    .find({ millis: { $gt: 30 }})
    .pretty();

We can also go a level deeper by finding all the queries which are doing operations commonly known to be slow, such as large scans on a significant portion of our data.

This command will return the list of queries performing a full index range scan or full index scan:

db.system.profile
    .find({ "nreturned": { $gt: 1 }})
    .pretty();

This command will return the list of queries performing scans on greater than a specified number (in this case, 100,000 documents) of documents:

db.system.profile
    .find({ "nscanned" : { $gt: 100000 }})
    .pretty();

This command will return the list of queries performing a full collection scan:

db.system.profile
    .find({ "planSummary": { $eq: "COLLSCAN" }, "op": { $eq: "query" }})
    .sort({ millis: -1 })
    .pretty();

If you’re doing real-time analysis on your query performance, the currentOp database method is extremely helpful for diagnosis. To find a list of all operations currently in execution, you can run the following command:

db.currentOp(true)

To see the list of operations that have been running longer than a specified amount of time (in this case, 3 seconds), you can run the following command:

db.currentOp({ "active" : true, "secs_running" : { "$gt" : 3 }})

Breaking Down & Understanding Slow Queries

Now that we’ve narrowed down our list of queries to all the potentially problematic ones, let’s individually investigate each query to understand what’s going on and see if there are any potential areas for improvement. Today, the vast majority of modern databases have their own features for analyzing query execution plans and performance statistics. In the case of MongoDB, this is offered through a suite of EXPLAIN helpers to understand what operations the database is taking to execute each query.

Using MongoDB’s EXPLAIN Methods

MongoDB offers its suite of EXPLAIN helpers through three methods:

The db.collection.explain() Method
The cursor.explain() Method
The explain Command

Each EXPLAIN method takes in verbosity mode which specifies what information will be returned. There are three possible verbosity modes for each command:

“queryPlanner” Verbosity Mode - MongoDB will run its query optimizer to choose the winning plan and return the details on the execution plan without executing it.
“executionStats” Verbosity Mode - MongoDB will choose the winning plan, execute the winning plan, and return statistics describing the execution of the winning plan.
“allPlansExecution” Verbosity Mode - MongoDB will choose the winning plan, execute the winning plan, and return statistics describing the execution of the winning plan. In addition, MongoDB will also return statistics on all other candidate plans evaluated during plan selection.

Depending on which EXPLAIN method you use, one of the three verbosity modes will be used by default (though you can always specify your own). For instance, using the “executionStats” verbosity mode with the db.collection.explain() method on an aggregation query might look like this:

db.collection
    .explain("executionStats")
    .aggregate([
        { $match: { col1: "col1_val" }},
        { $group: { _id: "$id", total: { $sum: "$amount" } } },
        { $sort: { total: -1 } }
    ])

This method would execute the query and then return the chosen query execution plan of the aggregation pipeline.

Executing any EXPLAIN method will return a result with the following sections:

The Query Planner (queryPlanner) section details the plan selected by the query optimizer.
The Execution Statistics (executionStats) section details the execution of the winning plan. This will only be returned if the winning plan was actually executed (i.e. using the “executionStats” or “allPlansExecution” verbosity modes).
The Server Information (serverInfo) section provides general information on the MongoDB instance.

For our purposes, we’ll examine the Query Planner and Execution Statistics sections to learn about what operations our query took and if/how we can improve them.

Understanding and Evaluating Query Execution Plans

When executing a query on a database like MongoDB, we only specify what we want the results to look like, but we don’t always specify what operations MongoDB should take to execute this query. As a result, the database has to come up with some kind of plan for executing this query on its own. MongoDB uses its query optimizer to evaluate a number of candidate plans, and then takes what it believes is the best plan for this particular query. The winning query plan is typically what we’re looking to understand when trying to see if we can improve slow query performance. There are several important factors to consider when understanding and evaluating a query plan.

An easy place to start is to see what operations were taken during the query’s execution. We can do this by looking at the queryPlanner section of our EXPLAIN method from earlier. Results in this section are presented in a tree-like structure of operations, each containing one of several stages.

The following stage descriptions are explicitly documented by MongoDB:

COLLSCAN for a collection scan
IXSCAN for scanning index keys
FETCH for retrieving documents
SHARD_MERGE for merging results from shards
SHARDING_FILTER for filtering out orphan documents from shards

For instance, a winning query plan might look something like this:

"winningPlan" : {
    "stage" : "COUNT",
    ...
    "inputStage" : {
        "stage" : "COLLSCAN",
        ...
    }
}

In this example, our leaf nodes appear to have performed a collection scan on the data before being aggregated by our root node. This indicates that no suitable index was found for this operation, and so the database was forced to scan the entire collection.

Depending on your specific query, there may also be several other factors worth looking into:

queryPlanner.rejectedPlans details all the rejected candidate plans which were considered but not taken by the query optimizer
queryPlanner.indexFilterSet indicates whether or not an index filter set was used during execution
queryPlanner.optimizedPipeline indicates whether or not the entire aggregation pipeline operation was optimized away, and instead, fulfilled by a tree of query plan execution stages
executionStats.nReturned specifies the number of documents that matched the query condition
executionStats.executionTimeMillis specifies how much time the database took to both select and execute the winning plan
executionStats.totalKeysExamined specifies the number of index entries scanned
executionStats.totalDocsExamined specifies the total number of documents examined

Conclusion & Next Steps

By now, you’ve probably identified one or more queries that are your top bottlenecks in improving query performance, and also have a good idea of exactly what parts of the execution are slowing down your response times. Often times, the only way to tackle these is by helping “hint” the database into selecting a better query execution strategy or covering index by rewriting your queries (e.g. using derived tables instead of subqueries or replacing costly window functions). Or, you can always try to redesign your application logic to see if you can avoid those costly operations entirely.

In Handling Slow Queries in MongoDB, Part Two, we’ll go over several other targeted strategies that can improve your query performance under certain circumstances.