Skip to main content

79. Lock-Based Concurrency Control

Status: Accepted Date: 2025-07-06

Context

The michi system operates by executing a sequence of commands: git pull, modify files, git commit, git push. This sequence is not atomic. If two processes (e.g., two AI agents, or an agent and a human) try to run a michi command at the same time, they could interfere with each other, leading to a race condition. For example, both could pull the same initial state, but one's push would invalidate the other's commit, causing the second push to fail. We need a simple mechanism to ensure that only one process is modifying the task files at any given time.

Decision

We will implement a simple file-based locking mechanism to ensure mutual exclusion for all michi operations.

Before starting its sequence of Git operations, a michi script must first acquire a lock. The lock will be implemented as a simple lockfile (e.g., .michi.lock) in the root of the repository.

  1. Acquire Lock: A script will attempt to create the .michi.lock file. If the file already exists, it means another process holds the lock, and the script will wait or exit.
  2. Write PID and Timestamp: Upon acquiring the lock, the script will write its Process ID (PID) and a timestamp into the lockfile. This helps in identifying stale locks.
  3. Time-To-Live (TTL): The lock is considered "stale" if it has existed for longer than a predefined TTL (e.g., 5 minutes). A new process is allowed to forcibly take over a stale lock, which prevents a crashed agent from holding the lock indefinitely.
  4. Release Lock: After the git push command completes successfully, the script will delete the .michi.lock file, releasing the lock for other processes.

Consequences

Positive:

  • Prevents Race Conditions: Effectively prevents multiple processes from interfering with each other, ensuring that the Git-based operations remain clean and conflicts are minimized.
  • Simple to Implement: A file-based lock is extremely simple to implement using standard shell commands (mkdir or flock) and requires no external services.
  • Stale Lock Cleanup: The TTL mechanism provides a robust way to automatically clean up stale locks left behind by crashed or hung processes.

Negative:

  • Reduces Concurrency: The lock is global for the entire michi system. While one process is working, all others must wait. This effectively serializes all task operations.
  • Doesn't Work on Non-Local Filesystems: Simple file-based locks may not be reliable on certain types of network filesystems (like NFS) if not implemented carefully.

Mitigation:

  • Acceptable for Single-Agent Workflow: The primary workflow is a single developer and their AI agent. True concurrency is rare, and serializing operations is an acceptable trade-off for correctness and simplicity. The operations are also very fast, so the lock is not held for long.
  • Targeted for Local Filesystems: The system is designed to run on a local developer machine where the filesystem is local and file locking is reliable.
  • Robust Implementation: Use a robust atomic method for creating the lockfile (e.g., mkdir is atomic on POSIX systems) to avoid race conditions in the locking logic itself.