48. Redis for Admin Operation State
Status: Accepted Date: 2025-07-06
Context
When processing administrative tasks asynchronously via a queue, we often need to manage transient state related to the operation. For example, we might need to implement a distributed lock to prevent concurrent modification of the same resource, track the progress of a multi-step operation, or temporarily cache intermediate results. We need a fast and reliable way to store and access this temporary, non-transactional state.
Decision
We will use Redis as the primary storage mechanism for all transient state related to administrative operations. Since Redis is already a core part of our stack (as the backend for BullMQ), using it for this purpose avoids adding a new dependency.
Examples of state to be managed in Redis include:
- Distributed Locks: Using Redis's atomic operations (like
SETNX) to ensure that only one worker can process a job for a specific entity at a time. - Progress Tracking: Storing the progress of a long-running job (e.g., "75% complete") in a Redis key so it can be queried by an admin dashboard.
- Job-Related Caching: Caching data needed for a specific job that doesn't need to be persisted permanently in the main PostgreSQL database.
Consequences
Positive:
- Performance: Redis is an in-memory datastore, making it extremely fast for the kind of rapid read/write operations needed for state management (like acquiring/releasing locks).
- No New Dependencies: Leverages an existing, core component of our infrastructure, simplifying the tech stack.
- Rich Data Structures: Redis provides versatile data structures (Hashes, Sets, Lists) that are well-suited for various state management needs, often simplifying the application logic compared to using a relational database for the same purpose.
Negative:
- Data is Not Permanent: Redis is not a durable, permanent store by default. In case of a catastrophic Redis failure, this transient state could be lost.
- Increased Redis Load: This will increase the load on our Redis instance, which could potentially impact queue performance if not monitored properly.
Mitigation:
- State is Transient: The state being stored is, by definition, transient and not the primary source of truth. The loss of this data would be inconvenient (e.g., a progress bar would reset, a lock might be released early), but it would not result in the corruption of core application data stored in PostgreSQL.
- Clear Key Naming Conventions: We will establish and enforce a strict key naming convention (e.g.,
admin:lock:<entity>:<id>) to keep the Redis keyspace organized and avoid collisions. - Monitoring: The load on the Redis instance will be actively monitored via Prometheus/Grafana to ensure that it remains within acceptable limits. If necessary, the Redis cluster can be scaled up.