Mercury Fine-Tuning Framework: From Chaos to Systematic Optimization
After months of architectural development, Mercury reached a critical milestone: all core components were implemented and operational. We transitioned from building the foundation to the fine-tuning phase—optimizing our AI-powered trading system for maximum profitability. However, this phase revealed significant challenges in our experimental framework that ultimately hindered progress rather than accelerating it.
The Context: What Went Wrong
1. Experimental Framework Chaos
Over the past 1-2 months, we conducted numerous experiments to optimize Mercury's performance. The fundamental problem? No systematic tracking of experimental results. While I conducted several experiments, answering the simple question "What were the results?" became impossible. The data exists somewhere in our repository history, but analyzing what we've already tried is painfully difficult and time-consuming.
This lack of experimental rigor meant we were essentially flying blind—potentially repeating failed experiments or missing crucial insights from successful ones.
2. Multi-Variant Monitoring Complexity
Our initial solution to A/B testing was deploying multiple Mercury variants (A, B, H, R) running different configurations simultaneously. While this provided valuable data, it created a new problem: cognitive overload.
Previously, I could analyze performance by monitoring a single Telegram channel with trading artifacts. Now, with four variants running, each generating its own stream of data, comprehensive analysis became exponentially more complex. The monitoring overhead grew from manageable to overwhelming.
3. Dashboard: Beautiful but Broken
To address the multi-variant monitoring challenge, we developed a comprehensive dashboard to visualize performance across all variants. The dashboard was aesthetically pleasing and feature-rich, but it introduced its own set of problems:
- Data inconsistencies between dashboard views and Telegram artifacts
- Internal dashboard inconsistencies where different views showed conflicting information
- New bugs unrelated to core trading logic that created false signals
Ironically, our solution to complexity created more complexity. We built a pretty dashboard that made our decision-making harder instead of easier.
4. The Human Bus Factor Problem
During this critical fine-tuning period, I became unavailable due to legal battles unrelated to development. This exposed a fundamental vulnerability: Mercury's optimization process had a bus factor of one.
When I returned, I was completely out of context. Meanwhile, the urgent need to migrate to our new Nx monorepo setup (recently completed) further demonstrated how fragile our development process had become with a single human bottleneck.
The Solution: Systematic Experimental Framework
Based on these painful lessons, I'm proposing a complete redesign of our fine-tuning framework that addresses each identified problem:
1. Simplified Variant Architecture
Current State: 4 complex variants (A, B, H, R) with overlapping functionality Proposed State: 3 focused variants with clear purposes
-
W (Warrior): Production trading variant
- Live trading + shadow trading
- Production configuration
liveRatio = 1(all shadow positions become live trades)
-
R (Researcher): Production validation variant
- Shadow trading only
- Production configuration
liveRatio = 0(no live trades, pure validation)
-
ABH (Experimental): Unified experimental variant
- Shadow trading only
- Experimental configurations
liveRatio = 0(safe experimentation)
2. Live Ratio Control System
The liveRatio property enables gradual scaling from paper trading to live trading:
- Production config: Latest optimized settings based on experimental results
- W variant: Full live trading using proven configurations
- R variant: Validates production config without financial risk
- ABH variant: Tests experimental configurations safely
3. Unified Experimental Control
Instead of managing three separate experimental instances (A, B, H), the ABH variant will:
- Run as a single instance with better control mechanisms
- Execute the same experimental scope as the previous three variants
- Use extended tournament configurations stored in application code, not environment variables
- Include scheduling, comparison methods, and other experimental parameters
- Isolate the entire experimental framework to a few configuration files
4. Non-Interfering GPU Load Management
All variants will run tournaments without interfering with each other by:
- Ensuring tournaments don't run in parallel (previously achieved through schedule shifts)
- Implementing explicit combined configuration management
- Safely consolidating A/B/H functionality into the single ABH instance
5. Agent-Driven Performance Analysis
Critical shift: Performance analysis moves from my domain to AI agent domain.
- Agent responsibility: Conduct thorough performance analysis
- Human responsibility: Review analysis and confirm experimental setup/outcomes
- Benefit: Eliminates human bottleneck in data analysis
6. Pre-MCP Analysis Infrastructure
While we plan to implement Model Context Protocol (MCP) for comprehensive analysis, we'll start with:
- Extended APIs to provide agents with experimental results
- Wrapper scripts that present data in easily digestible formats for AI analysis
- Automated reporting that reduces human analysis overhead
7. Methodology Documentation and Automation
The entire framework will be wrapped in:
- Comprehensive methodology documentation
- Cursor rules for systematic experimentation
- Automated workflows that make optimization a matter of invoking rules rather than manual analysis
For me, this transforms the process from complex manual analysis to simply invoking methodologies and reviewing AI-generated insights.
8. Dashboard as Backup Tool
The dashboard will be repositioned as:
- Secondary analysis tool for specific system components
- Backup option when deeper investigation is needed
- Lower priority until the primary AI-driven flow is established
Expected Outcomes
This redesigned framework addresses our core problems:
- Systematic Experimentation: Clear tracking and analysis of all experiments
- Reduced Cognitive Load: Single experimental variant instead of three
- Reliable Data: Focus on proven data sources with AI validation
- Eliminated Bus Factor: AI-driven analysis removes human bottleneck
- Methodical Optimization: Rules-based approach to systematic improvement
Next Steps
Implementation will proceed incrementally:
- Design extended tournament configuration structure
- Implement unified ABH experimental framework
- Create performance analysis APIs and scripts
- Develop methodology documentation and cursor rules
- Migrate current variants to new architecture
This framework transforms Mercury's optimization from an ad-hoc experimental process into a systematic, AI-augmented fine-tuning machine. The goal: turning Mercury into a consistently profitable trading system through rigorous, automated optimization.
The trading system that learns and improves itself, with humans providing strategic oversight rather than manual analysis.
