How to create and manage Mongo DB Materialized Views using triggers.
The Mongo DB Atlas cloud now supports triggers. These triggers come in two flavors — scheduled and database ops linked. Scheduled one runs at its own periodicity , eg. every 5 minutes. The “Database” type watches for events on collections — ‘INSERT’, ‘UPDATE’, ‘REPLACE’ and ‘DELETE’. When an event occurs, a configured Java Script function is called with “ChangeEvent” as parameter. This function can do anything it wants — including calling the Queen. I was only interested in keeping my materialized view updated. Let me tell you how.
MongoDB now has materialized views.
Recently we hit a performance problem with some reports. All of them were using a heavy aggregation logic through a query. The rate at which core data was growing was reaching a tipping point where users were losing it. This caused us to explore a solution that uses pre-processed report data. Luckily for us MongoDB now has a semblance of materialized views through “$merge” aggregation step. Which looked a very promising starting point.
To validate the performance gains story, we first did a baseline of the old code and then wrote a maintenance script that populated this materialized view using $merge aggregation step.
$merge aggregation step needs to be the last one in the pipeline.
Running reports against this materialized view gave 10x performance gains because many of the many costly $lookup in the original code got eliminated. This gave us confidence that materialized view is a good way to go.
How to keep this materialized view updated.
These views don’t update by itself. When the underlying data changes the view needs to be updated. Rebuilding the whole view was not a practical solution for us. I am sure it is not a practical solution for any reasonably busy system. So it is important for us to find a solution that can scale.
Solution 1 : A Queue + Listener Processor:
This approach has many advantages. The approach is to add small pieces of code to send a notification whenever you are updating relevant data. A queue can have an attached listener. This listener can trigger unit level update of our materialized view. The advantage here is that you get full control over when your view should be updated. This is better than blindly refreshing your view on every update to underlying data. You also can batch the events and update your view only when a minimum threshold of pending transactions are reached.
If your solution is in a cloud environment like AWS you will have a Queue management system that like AWS-SQS handy. In other environments stand alone products like Rabbit MQ or Active MQ fits the bill perfectly. The common factor here is the Publish-Subscribe mechanism that gives easy constructs to deliver messages to Queues and consume messages from Queues.
On the negative side you need to consider the code complexity involved in wiring together apt notification constructs to different parts of your software. If all the data updates you are interested in are passing through a common layer then it might be easy for you to manage. But if your data update touch points are many and is managed by a large team you may want to consider the Solution 2.
Solution 2 : Mongo DB Triggers (Atlas Realm):
This is not a standard solution that is available in standalone MongoDB. Database triggers are available if you are using Atlas — the cloud hosted MongoDB solution. It is paid and it will cost you if your usage goes above the free tier. It is NOT available if you are using a standard MongoDB community edition running on your server.
This approach has a few negatives we need to consider as well.
- This costs extra. Each trigger run consumes some processing time and Atlas will add it to your bill once the usage crosses a monthly free tier.
- The trigger event is blind. Some of the updates into your underlying collection may be harmless in nature and may not require refreshing your materialized view. This intelligence should be built into the trigger to reduce unnecessary refresh.
- When you want to add or remove fields from your view you need to make multiple changes. First you need to change your initial priming script to re-populate your view based on changed requirements. You will also be modifying the trigger code as per the required field or logic changes. While there will be overlap in these two programs, since each runs in different environments you will be maintaining separate files. Since there is no common library that defines the common part of the aggregation you always run the risk of logic discrepancies. It is a technical debt.
- Updating the Trigger code can happen only through Atlas Dashboard. This adds a manual step to CI-CD pipeline.
Other ways to update materialized view:
You can consider a periodic rebuild of the view using a scheduled task. The scheduled trigger of Atlas will come handy here for the cloud mongodb users. For others a Crontab like technique that runs a program to update the view may work. Here the trick is to define periodicity of the job. You can further optimize the rebuild by saving the state of previous run. For example you can introduce an ‘updatedAt’ field into your source collection. Then you can compare the last run time and the ‘updatedAt’ time for source collection records to decide which records need refreshing.
Introduction of materialized view in mongodb helps to speed up certain queries that needs costly joins across collections. The lack of built in mechanism to update such a view leaves developers vulnerable to risky homegrown alternatives to a core database service that should have been part of the materialized view feature. It would seem, mongodb missed a trick here. By using ‘Database Triggers’ available with Atlas Cloud clusters it is possible to manage materialized view update effectively. Still lack of developer tooling to update the triggers leaves some engineering flaws that should be fixed for wider adoption.
That being said, Materialized View + Database Triggers bring home a brave new world for mongodb users.