Sign up to receive blog updates in your inbox.

Create APIs for Aggregations and Joins on MongoDB in Under 15 Minutes

Rockset has teamed up with MongoDB so you can build real-time apps with data across MongoDB and other sources. If you haven’t heard of Rockset or know what Rockset does, you will by the end of this guide! We’ll create an API to determine air quality using ClimaCell data on the weather and air pollutants.

Air quality has been documented to effect human health (resources at the bottom). In particular, levels of particulate matter (PM), sulfur dioxide (SO2), nitrogen dioxide (NO2), the ozone molecule (O3), and carbon monoxide (CO) are measured in order to suggest a person’s recommended activity level outdoors. What effects PM, SO2, NO2, O3, and CO levels are subject to extensive study: scientists examine temperature, humidity, traffic congestion, wind gust, and so on to see how the air quality index (AQI) changes with respect to these pollutants.

It’s important to note that this is a sample app to show how MongoDB can integrate with Rockset and demo Rockset’s super powers of building APIs. This is not a scientific project that is meant to be exhaustive in its conclusion. Much more data is needed, and many variables are not taken into consideration when doing the query. For more on air pollution, see below on resources. To leverage Rockset and MongoDB for your real-time applications (scientific or not), continue reading!

In subsequent tutorials, I can show how this dataset can potentially be used to understand how temperature and humidity affects AQI of PM and O3. You’re also welcome to do the challenges at the end of the tutorial to earn Rockset credit, swag, and more!

In this guide, we’re going to:

  1. Integrate MongoDB Atlas and Rockset

    • Build collections on MongoDB that will each map to Rockset’s collections
  2. Build a Python script that will continuously get the weather and air quality data from ClimaCell (Powered by ClimaCell) and put it into MongoDB
  3. Use Rockset’s Query Editor to write a query on real-time data coming in from MongoDB
  4. Create an API to the query we wrote
  5. Execute the API in our app and print out a statement about the air quality

Setup

Let’s get started on MongoDB

  1. After you’ve created an account on MongoDB Atlas, go ahead and navigate to Database Access → Custom Roles and click on Add New Custom Role. The image below is what actions and roles should be added for rockset-role. Rockset has secure read-only access to MongoDB Atlas.

Create a Rockset role

  1. Navigate to the Database Users tab and click on Add New Database User. Remember the password here, because we will need to use it again, when we create an integration on Rockset.

Add new database user

  1. Go ahead and create another database user that has Atlas admin privileges. We will be using this database user in our Python app. You can name this user yourName-admin. Make sure you remember the password here, because we will use this in our Python app.
  2. Navigate to the Network Access click on Add IP Address and whitelist these IPs: Embedded content: https://gist.github.com/nfarah86/c6014ea1d60ec6113948d889afb16fdf

Whitelist Rockset's and your IP addresses

  1. Navigate to Clusters and click on Collections then click on Create database. We are going to create a weather_pollution_db and it’s going to have weather_data collection. Create the database and weather_data collection on MongoDB
  2. Under the weather_pollution_db, there is going to be a plus sign. Click on the plus sign, and add another collection, air_pollution_data.

Create an air_pollution_data collection

  1. Go back to Clusters and click on Connect and click on Connect your application. Copy the string, because we will use it in Rockset. When we are in our Rockset account, the username is rockset-user and the password is the password you used when you created rockset-user. In our Python app, the username will be yourName-admin and the password associated with that database user.

Copy the connection string

  1. That’s it for MongoDB! Let’s go ahead and write our Python app!

Let’s build our Python app

  1. Create a project folder, and in that project, create a file .env.
  2. In the .env file add this:

    • Mongo_URI=“<uri string>”
    • The "<uri string>" is your connection string from MongoDB. Make sure you replace the username and password in the connection string with yourName-admin and the password you used when you created that database user.
    • It should look something like this: mongodb://nadine-role-admin:password....
  3. If you use a virtualenv go ahead activate a env for this project. Make sure you’re under Python 3.7 or higher.

    • I personally use Pyenv, but feel free to use whatever you want!
  4. Install python-dotenv:

    • $ pip install python-dotenv
  5. Install pymongo:

    • $ pip install pymongo
  6. Inside our project folder, go ahead and create settings.py This file should look like this: Embedded content: https://gist.github.com/nfarah86/f87a9d37f1f72bb2d4a73d9b73dc87b4.
  7. Create another file in the project folder called mongo_config.py. It should look like this: Embedded content: https://gist.github.com/nfarah86/1fc7bc9987d27edbec0fa9b32be95163
  8. In the project folder, go ahead and create file called script.py. All we’re going to do is make sure our Python app is connecting to MongoDB: Embedded content: https://gist.github.com/nfarah86/4d8e87ff6e70e1da1c017e80b8daeef2
  9. Under Clusters, click on the collections button. Go to weather_pollution_db and click on weather_data. You should see this:

See JSON data from the Python app

  1. Now that we know we can insert data into MongoDB, let’s go ahead and create a ClimaCell developer account and get an API KEY.

  2. In settings.py go ahead and add this:

    • CLIMACELL_API_KEY = os.environ.get('CLIMACELL_API_KEY')
  3. I chose ClimaCell because they offer realtime data for weather and air pollution. We’re going to work with this api. They have different parameters that can be added to the request. You can explore those here.
  4. In our project folder go ahead and pip install a few libraries:

    • $ pip install requests
    • $ pip install timeloop
  5. In script.py go ahead modify the packages we’re going to use: Embedded content: https://gist.github.com/nfarah86/a49cbaa033239c636ef4f3bbe1dca2d0

    • Timeloop a library that can run jobs at designated intervals.
  6. Keep insert_to_mongo() and add this function in script.py to get the weather data: Embedded content: https://gist.github.com/nfarah86/d2e3cc9236547e2fa630fd368dfee994

    • That lat and lon correspond to Beijing.
  7. Now, we’re going to add this function to get the air quality: Embedded content: https://gist.github.com/nfarah86/c598dbea0274d43215f15c9f01eca672
  8. We’ll modify insert_to_mongo() to look like this: Embedded content: https://gist.github.com/nfarah86/e43f4ad2d8f7e3ca4b8d761408bc853c
  9. To make sure we’re running continuously, write this: Embedded content: https://gist.github.com/nfarah86/959d875ad5ffcc08e16e3bf25358385a
  10. After, write main() like this: Embedded content: https://gist.github.com/nfarah86/831e295b663aceb93603d9986c815b43
  11. Here's a gist of what your script.py should look like: Embedded content: https://gist.github.com/nfarah86/85caee5b14639e238e34715094cc5436
  12. Now, run:

    • $ python script.py to populate MongoDB.
  13. While the script is running, let’s get started on Rockset.

Let’s get started on Rockset

  1. Login to Rockset and navigate to the Integrations tab on the left. Click on Add Integration. Click on MongDB and click on start:

click on MongoDB

  1. Check the first box MongoDB Atlas. We’re going to name this integration Rockset-Mongo-Integration. For the username and password, go ahead and put rockset-user and the password you use when you created this database user. Paste the connection string in the next box and click on Save Integration.

    • Each integration can be used to access multiple databases and collections in the same MongoDB cluster

Save integration

  1. Under Collections click on Create Collection. Select MongoDB as the source.

Click on MongoDB source

  1. Click on the rockset-mongo-integration.
  2. We’re going to name our new collection on Rockset weather_data_collection. This is not tied to MongoDB. Go ahead and fill out the rest of the page with the database and collection we created on MongoDB. We’re going to add 2 collections, but let’s start with the weather_data from MongoDB.

    • You see, Rockset is able to connect to MongoDB. You can verify what data will be ingested into the Rockset collection on the right-hand side. When you’ve created a collection and running a data-driven app in real-time, Rockset will continuously sync with MongoDB so your data can have the latest information.
    • Let’s click Create at the bottom.

weather_data

  1. Follow the same step 6-9 for air_pollution_data_collection:

Verify data for air_pollution_db

  • Note permissions can be changed in the MongoDB UI at any time without the need to drop and recreate integration. Except when username and/or password or connection string changes - then the user will need to drop and recreate the Rockst integration

Construct a Query on Rockset

  1. On the left bar, let’s navigate to the Query Editor.
  2. On the tab if we write:

    • Select * from commons.air_pollution_data_collection; we should see some output:

Output of query

  - Go ahead and do this for the `weather_data_collection`
  1. We’re going to write this sample query: Embedded content: https://gist.github.com/nfarah86/2d9c5bc316d55cfd0fcf17b4ded9141f

    • We’re averaging the PM10 data and the weather temperature data. We’re going to join both of these collections based on the date. If you noticed the timestamp in the JSON, the date is in ISO 8601 format. In order to join on the days (and eliminates the minutes, hours, and seconds), we’re going to do an extraction.
  2. Run the query.
  3. After we run this query, we want to embed it in our app, so we can notify our users when the levels fluctuate, and potentially predict, based on weather, what PM10 levels may look like the next day.

    • We’re going to need a lot more data than what we have now to predict based on temperature, but this is a start!

Build an API on our query on Rockset

  1. On the top corner, click on Create Query Lambda. A Query Lambda is a way to make an API endpoint to the SQL query you write. In the Python app, we won’t have to write client-side SQL, preventing security vulnerabilities.

Create query lambda

  1. Give your Query Lambda a name and description. Afterwards, you should see some code snippets on the next screen.
  2. Let’s navigate back on the Query Editor and write another query to get current weather in a new tab. Sometimes we may get a null field, so let’s go ahead and write this in the Query Editor: Embedded content: https://gist.github.com/nfarah86/4581c6bc09d30045ae75a5f330a8d72f
  3. Create another new Query Lambda.
  4. If we want to grab the code snippet or URL of the Query Lambdas we just created, navigate on the left side menu to Query Lambda and click on the lambda you created.

Execute APIs on our app

  1. Once you create a Query Lambda, you’ll see something like this:
    Get Code Snippet to execute API
  2. There are two ways I’ll show how we can execute a lambda:

    • Make an HTTP Request
    • Rockset's Python client (bottom box where my API is boxed out)
  3. Make an HTTP Request:

    • Let’s go ahead and make an HTTP request** to get the avg_pm10 data. Your code should look like this:
    • Go ahead and set your ROCKSET_API_KEY in your .env. Import it in settings.py like we did before.
    • On Rockset, navigate to the Query Lambda that has the current_weather query. Copy the query lambda endpoint.
    • We’re going to write this function that will make an HTTP request to that endpoint:Embedded content: https://gist.github.com/nfarah86/3a0ef9b1524532247e3ea7c504489d23
  4. Use the Rockset Client to send a request:

  5. Then, we’re going to display the result:Embedded content: https://gist.github.com/nfarah86/a0d1e15319bc117ef55ce35187fb6480
  6. We’re going to change sample_job_every_120s()to add make_requests so we can execute the Query Lambdas and display the data:Embedded content: https://gist.github.com/nfarah86/0a54e082c9026aa5c9940b24836d9c65
  7. Write make_requests() so it looks like this:Embedded content: https://gist.github.com/nfarah86/dea06329b25887bb58a0ef74c4a12fb0
  8. After you run the script, you should see this:Embedded content: https://gist.github.com/nfarah86/32b35bd3269fbd1701dc57252fa783e4
  9. That’s it! This wraps it up for the MongoDB-Rockset Python App!

Project Code

You can find the full project, including the SQL statements here. If you have questions about the project, Rockset, or MongoDB, you can each out in our slack channel.

Complete these challenges to earn credit and swag:

  • 1st challenge: So far, we’ve only considered PM10 in our queries. What would this look like for O3 and weather data?
  • 2nd challenge: If we wanted to see the average humidity and PM10 and O3 levels, what would that look like?
  • 3rd challenge: If you were going to predict how weather and humidity effects PM10 and O3, what would that look like?

Prizes

  • If you completes challenges 1 or 2: $100 in rockset credit plus t-shirt
  • If you completes challenges 1 and 2: $250 in rockset credit plus t-shirt
  • If you completes challenge 3: $500 in rockset credit, hoodie, and a bose soundlink
  • If you completes all 3 challenges $800 in rockset credit, hoodie, and a bose soundlink
  • Share your challenge in the slack channel in #mongodb.

Deadline: June 15, 2020

Check out our GitHub or our slack channel for the current status on the challenge. The first 3 people to submit will win the appropriate prize.

Resources:



Other MongoDB resources:

Search and analyze your data in real-time

Ready to start building?

Related Posts

Handling Slow Queries in MongoDB - Part 1: Investigation

Latency is one of the most essential factors in the performance of any application at scale. We explore various methods of identifying slow queries on MongoDB and understand how to improve them.

Performance Isolation for Your Primary MongoDB Cluster - Part 1

Performance of your primary MongoDB cluster is crucial. We look at how using multiple MongoDB clusters can help with performance isolation.

Improving MongoDB Read Performance - Indexing, Replication and Sharding

Real-time analytics demands low-latency complex queries. We discuss various options for speeding up read performance, including indexing, replication and sharding in MongoDB.