Building a Serverless Microservice Using Rockset and AWS Lambda

Rockset is a serverless search and analytics engine that makes it easy to develop data-driven apps. This video demo shows an example of what's possible with Rockset. For this exercise, we will build a serverless microservice to discover the stock symbols with the most mentions on Twitter.

e59e6e97b5644ea87b8522a8d0295470

Ingest

Our Twitter stream comes from Amazon Kinesis and is continuously ingested into Rockset. It's a simple process to set up a live integration between Rockset and Kinesis from the Rockset console. Refer to our step-by-step guide for more details, including info on setting up the Twitter Kinesis stream.

We also want to combine the stock mentions from Twitter with information about these stocks from Nasdaq. This information comes from a file in Amazon S3 and is ingested into a second Rockset collection.



demo outline

Query

Rockset automatically infers the schema for the Twitter JSON data in the twitter-firehose collection. We haven't performed any transformation on the data, but we can immediately run SQL queries on it. Examining the results of our SQL query, note how the Twitter data is organized in multiple levels of nesting and arrays.

Screen Shot 2019-01-04 at 11.22.16 AM

In our example, we are specifically focused on tweets that contain stock mentions, which we find under the symbols arrays in the entities field. We gradually explore the data and build out our SQL query, joining tweet data with the Nasdaq company info in the tickers collection, to return the most popular stocks in our data set along with some descriptive info about each stock.

-- unnest tweets with stock ticker symbols from the past 1 day
WITH stock_tweets AS
      (SELECT t.user.name, t.text, upper(sym.text) AS ticker
       FROM   "twitter-firehose" AS t, unnest(t.entities.symbols) AS sym
       WHERE  t.entities.symbols[1] is not null
         AND  t._event_time > current_timestamp() - INTERVAL 1 day),

-- aggregate stock ticker symbol tweet occurrences 
    top_stock_tweets AS
      (SELECT ticker, count(*) AS tweet_count
       FROM   stock_tweets
       GROUP BY ticker),

-- join stock ticker symbol in tweets with NASDAQ company list data
    stock_info_with_tweets AS 
      (SELECT top_stock_tweets.ticker, top_stock_tweets.tweet_count,
              tickers.Name, tickers.Industry, tickers.MarketCap
       FROM top_stock_tweets JOIN tickers
         ON top_stock_tweets.ticker = tickers.Symbol)

-- show top 10 most tweeted stock ticker symbols along with company info
SELECT * 
FROM   stock_info_with_tweets t
ORDER BY t.tweet_count DESC
LIMIT 10

Build

Rockset allows you to export your SQL query and embed it as is into your code.

Screen Shot 2019-01-03 at 2.48.50 PM

For our demo, we've built a Python-based serverless API, using AWS Lambda, that returns the stock symbols occurring most often in tweets.

from rockset import Client, Q
from lambdarest import lambda_handler
from credentials import API_KEY
import json
rs = Client(api_key=API_KEY,
api_server='https://api.rs2.usw2.rockset.com')
def lambda_handler(event, context):
if 'queryStringParameters' in event:
if 'interval' in event["queryStringParameters"]:
interval = event["queryStringParameters"]["interval"]
res = rs.sql(Q(f'''-- unnest tweets with stock ticker symbols from the past 1 day
WITH stock_tweets AS
(SELECT t.user.name, t.text, upper(sym.text) AS ticker
FROM "twitter-firehose" AS t, unnest(t.entities.symbols) AS sym
WHERE t.entities.symbols[1] is not null
AND t._event_time > current_timestamp() - INTERVAL {interval}),
-- aggregate stock ticker symbol tweet occurrences
top_stock_tweets AS
(SELECT ticker, count(*) AS tweet_count
FROM stock_tweets
GROUP BY ticker),
-- join stock ticker symbol in tweets with NASDAQ company list data
stock_info_with_tweets AS
(SELECT top_stock_tweets.ticker, top_stock_tweets.tweet_count,
tickers.Name, tickers.Industry, tickers.MarketCap
FROM top_stock_tweets JOIN tickers
ON top_stock_tweets.ticker = tickers.Symbol)
-- show top 10 most tweeted stock ticker symbols along with their company name, industry and market cap
SELECT *
FROM stock_info_with_tweets t
ORDER BY t.tweet_count DESC
LIMIT 10'''))
return {
"isBase64Encoded": False,
"headers": {
"Access-Control-Allow-Origin" : "*"
},
"statusCode": 200,
"body": json.dumps([x for x in res])
}
if __name__ == '__main__':
print(lambda_handler(None, None))
view raw lambda_function.py hosted with ❤ by GitHub

Once set up, we can serve live queries on raw, real-time Twitter data. In these results, the company Name, Industry, and MarketCap come from the Nasdaq company info.

Screen Shot 2019-01-03 at 3.37.37 PM

We can also build a rudimentary app that calls the API and displays the stock symbols with the most mentions on Twitter for customizable time intervals.

Screen Shot 2019-01-03 at 3.39.27 PM

We've provided the code for the Build steps—the Python Lambda function and the dashboard—in our recipes repository, so you can extend or modify this example for your needs.

There's quite a lot going on in this example. We've taken raw JSON and CSV from streaming and static sources, written SQL queries joining the two data sets, used our final SQL query to create a serverless API, and called the API through our app. You can view more detail on how we implemented this serverless microservice in the video embedded above. Hopefully this demo will spur your imagination as you consider what you can build on Rockset.

Real-time SQL on raw data

Related Posts

Comparing Approaches to Real-Time Analytics on Amazon DynamoDB

Compare and contrast various approaches to analytics on DynamoDB. Learn how best to build real-time apps, create live dashboards, and run fast SQL queries on DynamoDB.

Fynd - How Does a Growing E-Commerce Portal Respond to Consumer Behavior in Real Time?

Fynd uses Rockset to perform fast queries on real-time event streams, so they can react to consumer behavior as it happens.

How to Build a Facebook Messenger Chatbot Powered by Fast SQL on CSV

Build a chatbot that provides instant responses, leveraging fast SQL queries on CSV data.