Case Study: Fleet Management System – An End-to-End Streaming Data Pipeline

April 3, 2020



Fleet operators often suffer business and monetary losses due to a lack of information on the health of their fleet and inventory it carries. This problem arises due to a lack of real-time data on vehicle health or inventory health, to take preemptive action or real-time action.

truck-3910170 1920


  1. A vehicle’s coolant is leaking and engine temperature is going up. If not detected and addressed, the vehicle might get stranded. The repair costs would be higher if preemptive action was not taken and also inventory delivery would suffer delay, causing business loss.
  2. A vehicle’s AC is malfunctioning causing temperature inside the vehicle’s storage to go up. Perishable items being carried in the vehicle will become stale if real-time action isn’t taken and goods not shifted to another vehicle where the AC is functioning properly. Such events would also lead to business loss.
  3. If a vehicle gets stranded at a remote location and the vehicle’s exact location information is not known, then the fleet operator wouldn’t be in a position to offer quick help. This, in turn, reduces the efficiency of the fleet operator.


The proposal is to build a fleet management system for operators to manage their fleet efficiently. The solution will offer a dashboard to:

  • monitor parameters like overall health – engine temperature, fuel pressure, etc. of the fleet and individual vehicle
  • monitor location of each vehicle
  • monitor detailed vehicle CPU information in real-time and related analytics

This solution would enable the operators to take real-time and preemptive decisions to handle some of the scenarios explained earlier.


The proposed template of the solution and data pipeline for fleet management would look as shown in the below diagram.


The various components of the architecture labelled by numbers in the diagram above have been explained briefly below:

Mobile client

The mobile client has been built on top of the sample code provided by AWS. The client simulates the sensor data from a vehicle.

  • It makes use of the AWS IoT APIs to securely publish-to MQTT topics.
  • It uses Cognito federated identities in conjunction with AWS IoT to create a client certificate and private key and store it in a local Java Keystore. This identity is then used to authenticate to AWS IoT.
  • Once a connection to the AWS IoT platform has been established, the sample app presents a simple UI to subscribe over MQTT.
  • The app will use the certificate and private key saved in the local java Keystore for future connections.

Amazon Cognito

Mobile Client connects to the AWS IoT platform using Cognito and upload certificates and policies.

Note: This project makes use of unauthenticated users in the identity pool. This needs improvement and has only been used for the prototypes. Unauthenticated users should typically only be given read-only permissions if used in production applications.

AWS IoT Core (MQTT Client)

AWS IoT Core allows you to easily connect devices to the cloud and receive messages using the MQTT protocol which minimises the code footprint on the device.

In this project, AWS IoT Core has been used to act upon device data on the fly, based on appropriate business rules. In this project, IoT Core uses Lambda to act upon the received data.


  • Policy to allow Mobile Client access to IoT Core
  • Policy to allow Lambda function to execute and access AWS resources
  • Policy to allow Lambda function to read and write to DynamoDB
  • Policy to allow Lambda function to access SNS
  • User role to allow Rockset to access DynamoDB


  • Handle data sent from IoT Core and process it. Decision taken to write data into suitable DynamoDB tables
  • Handle scenario when data is out of range and send email to the configured email address via SNS


This project uses DynamoDB to store the large volume of data that would be generated in a live environment. Data is stored in the DB in JSON format.


This SaaS service allows fast SQL on NoSQL data from varied sources like Kafka, DynamoDB, S3 and more. Rockset has been used to query from the JSON data in DynamoDB as per the business needs of the future.


Redash allows to connect and query from different data sources, build dashboards to visualize data. In this project, it is used to connect to Rockset and present the data on a dashboard to be consumed by the fleet management operator.


This service has been used to send an alert to the configured email address when the data received from the device is out of range.


  1. Given the huge number of services and solutions offering similar capabilities, selecting the right service was a tough choice. For example, we could have used either DynamoDB or Cassandra or MongoDB for this project and all would be able to meet the requirement of handling IoT data at scale.
  2. We had selected Amazon MSK to run Kafka and Spark. But, then there were issues as to which interoperable version of software (Spark, Kafka) to choose to run on the cluster. The use of Amazon MSK was redundant and the required processing was possible in the Lambda function itself. Since IoT Core was taking care of the queuing mechanism, there wasn’t really a need for a queue again.
  3. Plugging in the vehicle data into the Kafka producer became a tough challenge and thus we began exploring what services AWS provides. That is when we discovered that AWS IoT could be a good replacement.
  4. The processing was supposed to be done in Spark, is done by these services like Rockset using simple SQL queries on the NoSQL DynamoDB via the DynamoDB Streams. While Spark is still an excellent choice for the requirement of this project, it offers way too many options and was too generic for the scope of the project we had selected.
  5. Selecting a dashboard that would work with DynamoDB streams and was also easy to set up was a major challenge. There are plenty of options out there from open-source like Apache Superset to various commercial options like Tableau, Grafana, etc. The set-up and data visualization through Rockset was a lot easier and better for the use case in this project.


  1. While architecting a solution (assuming a cloud-native and not movement from on-prem to cloud), the most challenging aspect would perhaps be the choice of service to use. The selection could be based on various parameters like time to market, cost, long-term cost implication, portability to other cloud vendors, etc.
  2. If time to market is of primary concern, managed services provided by the cloud vendor should be preferred over popular/open-source technologies.
  3. Estimating the cost, planning what could be future growth and its impact on cost would be a tough challenge. We would need to improve a lot if we were to architect the solution in the real world.

Originally published at


Santosh Prabhu – Santosh works as a solution architect in IoT product development at KaHa Technologies Pvt. Ltd. He is interested in Big Data engineering and Streaming technologies. He has 15 years of work experience in design and development of devices, apps and products.

Abhijeet Upadhyay – Abhijeet leads the development of IoT products at KaHa Technologies Pvt. Ltd. He is interested in Big Data engineering and Streaming technologies. He has 12 years of work experience in design and development of apps and products.

Image by Capri23auto from Pixabay