67. BullMQ for Reliable Job Scheduling
Status: Accepted Date: 2025-07-06
Context
Having decided to centralize our scheduling logic (adr://centralized-scheduling), we need to choose the underlying technology to power it. A custom, in-memory solution would be fragile and lost on restart. A database-polling solution (writing desired run times to a table and having a process poll it) can be inefficient and difficult to scale. We need a reliable, persistent, and efficient mechanism for triggering future tasks.
Decision
The Schedulers module will use BullMQ's delayed and repeatable jobs feature as its core scheduling engine.
When a service requests a future task, the Schedulers module will not run the task itself. Instead, it will translate the request into a job and add it to the appropriate BullMQ queue with a specified delay. When the delay elapses, BullMQ will automatically make the job available for a worker to consume. For recurring tasks (like cron jobs), it will use BullMQ's repeat options.
The Schedulers module's primary role is therefore to be an intelligent and domain-aware wrapper around BullMQ's scheduling capabilities.
Consequences
Positive:
- High Reliability & Persistence: By persisting the scheduled jobs in Redis, BullMQ guarantees that scheduled tasks are not lost if the application restarts. This is a massive improvement over in-memory
setTimeout. - Leverages Existing Infrastructure: This decision uses BullMQ and Redis, which are already core components of our stack, introducing no new infrastructure dependencies.
- Efficiency: BullMQ's mechanism for handling delayed jobs is highly efficient and does not involve constant database polling.
- Decoupling of Scheduling from Execution: This reinforces the separation of concerns. The Schedulers module is responsible for when a job should run. The worker processes are responsible for how it runs.
Negative:
- Dependency on BullMQ/Redis: The entire scheduling system is now critically dependent on the health and availability of our BullMQ/Redis infrastructure.
- Abstraction Layer: We are building a custom abstraction (the Schedulers module) on top of BullMQ's existing features. We need to ensure this abstraction adds value (e.g., domain-specific logic, easier-to-use APIs) and doesn't just add unnecessary complexity.
Mitigation:
- High-Availability Redis: Our infrastructure plan already includes a highly available Redis cluster. This risk is understood and accepted.
- Purposeful Abstraction: The Schedulers module's value is in centralization and providing a domain-specific API (e.g.,
scheduleOrderFillCheck) rather than having every service construct its own generic BullMQ job. It translates business intent into a queueing implementation detail, which is a valuable abstraction. - Monitoring: We will monitor the health of the BullMQ queues and the scheduler-related jobs specifically, using the built-in dashboards and our Grafana setup.