Skip to main content

48. Redis for Admin Operation State

Status: Accepted Date: 2025-07-06

Context

When processing administrative tasks asynchronously via a queue, we often need to manage transient state related to the operation. For example, we might need to implement a distributed lock to prevent concurrent modification of the same resource, track the progress of a multi-step operation, or temporarily cache intermediate results. We need a fast and reliable way to store and access this temporary, non-transactional state.

Decision

We will use Redis as the primary storage mechanism for all transient state related to administrative operations. Since Redis is already a core part of our stack (as the backend for BullMQ), using it for this purpose avoids adding a new dependency.

Examples of state to be managed in Redis include:

  • Distributed Locks: Using Redis's atomic operations (like SETNX) to ensure that only one worker can process a job for a specific entity at a time.
  • Progress Tracking: Storing the progress of a long-running job (e.g., "75% complete") in a Redis key so it can be queried by an admin dashboard.
  • Job-Related Caching: Caching data needed for a specific job that doesn't need to be persisted permanently in the main PostgreSQL database.

Consequences

Positive:

  • Performance: Redis is an in-memory datastore, making it extremely fast for the kind of rapid read/write operations needed for state management (like acquiring/releasing locks).
  • No New Dependencies: Leverages an existing, core component of our infrastructure, simplifying the tech stack.
  • Rich Data Structures: Redis provides versatile data structures (Hashes, Sets, Lists) that are well-suited for various state management needs, often simplifying the application logic compared to using a relational database for the same purpose.

Negative:

  • Data is Not Permanent: Redis is not a durable, permanent store by default. In case of a catastrophic Redis failure, this transient state could be lost.
  • Increased Redis Load: This will increase the load on our Redis instance, which could potentially impact queue performance if not monitored properly.

Mitigation:

  • State is Transient: The state being stored is, by definition, transient and not the primary source of truth. The loss of this data would be inconvenient (e.g., a progress bar would reset, a lock might be released early), but it would not result in the corruption of core application data stored in PostgreSQL.
  • Clear Key Naming Conventions: We will establish and enforce a strict key naming convention (e.g., admin:lock:<entity>:<id>) to keep the Redis keyspace organized and avoid collisions.
  • Monitoring: The load on the Redis instance will be actively monitored via Prometheus/Grafana to ensure that it remains within acceptable limits. If necessary, the Redis cluster can be scaled up.