44. Idempotency for Critical Transactions
Status: Accepted Date: 2025-07-06
Context
In a distributed system like Mercury, network requests can fail or time out. A client (e.g., the core backend logic) might send a request to a service (e.g., an order execution service or an exchange API) and not receive a response. The standard recovery mechanism is to retry the request. However, if the original request was actually processed, a simple retry could lead to dangerous side effects, such as creating duplicate orders, processing the same deposit twice, or incorrectly updating a position. This would lead to incorrect state and potential financial loss.
Decision
All critical, state-changing API endpoints and internal service commands within the Mercury ecosystem must be designed to be idempotent. This will be achieved by requiring the client to generate a unique idempotency key (e.g., a UUID) for each distinct transaction.
The server-side implementation must:
- Store a record of the idempotency keys it has successfully processed within a reasonable time window (e.g., 24 hours).
- Before executing a new request, check if the provided idempotency key has already been processed.
- If the key has been seen, the server must not re-process the request. Instead, it should return the saved response from the original successful request.
Consequences
Positive:
- Safety and Correctness: Prevents duplicate transactions and ensures that operations can be safely retried, making the system more resilient to network failures.
- Simplified Client Logic: Client-side error handling is simplified. On a timeout or network error, the client can safely retry the request with the same idempotency key, knowing it won't cause a duplicate operation.
Negative:
- Increased Server-Side Complexity: The server must implement logic to track and manage idempotency keys, which adds complexity to every state-changing endpoint.
- Storage Overhead: Storing idempotency keys requires additional storage (e.g., in Redis or a database table).
- Performance Overhead: Each request incurs a slight performance penalty due to the need to look up the idempotency key.
Mitigation:
- Shared Middleware: The idempotency-checking logic can be implemented as a reusable middleware or decorator in the NestJS framework. This reduces code duplication and ensures a consistent implementation across all relevant endpoints.
- Time-Limited Storage: Idempotency keys only need to be stored for a finite period (e.g., 24-48 hours), after which it is safe to assume the original client is no longer retrying the request. This can be managed with a TTL (Time To Live) mechanism in a cache like Redis.
- Selective Application: This requirement will be strictly enforced for critical, state-changing operations. It is not necessary for read-only (GET) requests or other non-critical operations.