67. BullMQ for Reliable Job Scheduling

Status: Accepted Date: 2025-07-06

Context

Having decided to centralize our scheduling logic (adr://centralized-scheduling), we need to choose the underlying technology to power it. A custom, in-memory solution would be fragile and lost on restart. A database-polling solution (writing desired run times to a table and having a process poll it) can be inefficient and difficult to scale. We need a reliable, persistent, and efficient mechanism for triggering future tasks.

Decision

The Schedulers module will use BullMQ's delayed and repeatable jobs feature as its core scheduling engine.

When a service requests a future task, the Schedulers module will not run the task itself. Instead, it will translate the request into a job and add it to the appropriate BullMQ queue with a specified delay. When the delay elapses, BullMQ will automatically make the job available for a worker to consume. For recurring tasks (like cron jobs), it will use BullMQ's repeat options.

The Schedulers module's primary role is therefore to be an intelligent and domain-aware wrapper around BullMQ's scheduling capabilities.

Consequences

Positive:

High Reliability & Persistence: By persisting the scheduled jobs in Redis, BullMQ guarantees that scheduled tasks are not lost if the application restarts. This is a massive improvement over in-memory setTimeout.
Leverages Existing Infrastructure: This decision uses BullMQ and Redis, which are already core components of our stack, introducing no new infrastructure dependencies.
Efficiency: BullMQ's mechanism for handling delayed jobs is highly efficient and does not involve constant database polling.
Decoupling of Scheduling from Execution: This reinforces the separation of concerns. The Schedulers module is responsible for when a job should run. The worker processes are responsible for how it runs.

Negative:

Dependency on BullMQ/Redis: The entire scheduling system is now critically dependent on the health and availability of our BullMQ/Redis infrastructure.
Abstraction Layer: We are building a custom abstraction (the Schedulers module) on top of BullMQ's existing features. We need to ensure this abstraction adds value (e.g., domain-specific logic, easier-to-use APIs) and doesn't just add unnecessary complexity.

Mitigation:

High-Availability Redis: Our infrastructure plan already includes a highly available Redis cluster. This risk is understood and accepted.
Purposeful Abstraction: The Schedulers module's value is in centralization and providing a domain-specific API (e.g., scheduleOrderFillCheck) rather than having every service construct its own generic BullMQ job. It translates business intent into a queueing implementation detail, which is a valuable abstraction.
Monitoring: We will monitor the health of the BullMQ queues and the scheduler-related jobs specifically, using the built-in dashboards and our Grafana setup.

Context​

Decision​

Consequences​

Context

Decision

Consequences