62. Unrestricted Data Capture for Intelligence
Status: Accepted Date: 2025-07-06
Context
To effectively research and optimize trading strategies, we need a complete dataset of their potential performance. If we only record the trades that are actually executed after passing through our risk management and portfolio constraint filters, our view of the strategy's underlying "alpha" is heavily biased and incomplete. We have no data on the promising trades that were filtered out, and we cannot analyze why they were filtered.
Decision
The ABH (Intelligence/Hypothesis) instance will be configured for Unrestricted Data Capture. Its primary purpose is to act as a market intelligence tool, not a trading simulator.
Specifically, the ABH instance will:
- Run the same core strategy logic as the W and R instances.
- Completely disable or bypass all risk management and portfolio constraint filters.
- Record every potential trade signal generated by the strategies (e.g., the "top K" positions from a tournament) to a database for offline analysis.
This instance does not trade or simulate trading; it is a pure data collection engine for generating a "ground truth" dataset of a strategy's raw, unfiltered signals.
Consequences
Positive:
- Priceless Analytical Data: Creates a complete, unbiased dataset of all potential trades, which is invaluable for understanding a strategy's true behavior, fine-tuning its parameters, and improving the risk filters themselves.
- Improved Strategy Backtesting: Allows for much more sophisticated backtesting, where different risk management models can be applied to the raw signal data to see how they would have performed.
- Clear Separation of Alpha and Risk: Helps to separate the analysis of the core strategy's ability to find good trades (alpha generation) from the analysis of the portfolio's risk management rules.
Negative:
- Increased Data Volume & Storage Costs: Capturing every potential trade will generate significantly more data than just recording executed trades, increasing storage requirements and costs (e.g., in our Cassandra cluster).
- Misinterpretation Risk: The data from the ABH instance represents a hypothetical, "perfect world" scenario. There is a risk that analysts could misinterpret this data and overestimate a strategy's real-world performance if they forget to account for risk and execution constraints.
Mitigation:
- Efficient Time-Series Storage: The data will be stored in our Cassandra cluster, which is designed to handle high-volume time-series data efficiently (
adr://cassandra-timeseries). We will also implement data retention and archival policies to manage storage costs over time. - Clear Data Labeling and Documentation: All data sourced from the ABH instance will be clearly labeled as "unrestricted" or "hypothetical" in our databases and analytics tools. Documentation will repeatedly stress that this data does not represent real-world performance and must be analyzed in conjunction with risk models.
- Purpose-Built Analytics: The analytics pipeline that consumes this data will be purpose-built to apply various risk and cost models to the raw signals, providing a more realistic performance picture during analysis.