Skip to main content

44. Idempotency for Critical Transactions

Status: Accepted Date: 2025-07-06

Context

In a distributed system like Mercury, network requests can fail or time out. A client (e.g., the core backend logic) might send a request to a service (e.g., an order execution service or an exchange API) and not receive a response. The standard recovery mechanism is to retry the request. However, if the original request was actually processed, a simple retry could lead to dangerous side effects, such as creating duplicate orders, processing the same deposit twice, or incorrectly updating a position. This would lead to incorrect state and potential financial loss.

Decision

All critical, state-changing API endpoints and internal service commands within the Mercury ecosystem must be designed to be idempotent. This will be achieved by requiring the client to generate a unique idempotency key (e.g., a UUID) for each distinct transaction.

The server-side implementation must:

  1. Store a record of the idempotency keys it has successfully processed within a reasonable time window (e.g., 24 hours).
  2. Before executing a new request, check if the provided idempotency key has already been processed.
  3. If the key has been seen, the server must not re-process the request. Instead, it should return the saved response from the original successful request.

Consequences

Positive:

  • Safety and Correctness: Prevents duplicate transactions and ensures that operations can be safely retried, making the system more resilient to network failures.
  • Simplified Client Logic: Client-side error handling is simplified. On a timeout or network error, the client can safely retry the request with the same idempotency key, knowing it won't cause a duplicate operation.

Negative:

  • Increased Server-Side Complexity: The server must implement logic to track and manage idempotency keys, which adds complexity to every state-changing endpoint.
  • Storage Overhead: Storing idempotency keys requires additional storage (e.g., in Redis or a database table).
  • Performance Overhead: Each request incurs a slight performance penalty due to the need to look up the idempotency key.

Mitigation:

  • Shared Middleware: The idempotency-checking logic can be implemented as a reusable middleware or decorator in the NestJS framework. This reduces code duplication and ensures a consistent implementation across all relevant endpoints.
  • Time-Limited Storage: Idempotency keys only need to be stored for a finite period (e.g., 24-48 hours), after which it is safe to assume the original client is no longer retrying the request. This can be managed with a TTL (Time To Live) mechanism in a cache like Redis.
  • Selective Application: This requirement will be strictly enforced for critical, state-changing operations. It is not necessary for read-only (GET) requests or other non-critical operations.