Apollo Analysis Implementation Strategy
Overview
Apollo Analysis is a system for multi-step data processing using LLMs, where each step can depend on the results of previous steps. The system uses a declarative description of steps and BullMQ for orchestration.
Key Components
- Declarative description of analysis steps - pure JSON without functions
- Orchestrator - uses FlowProducer to manage dependencies between steps
- Analysis service - executes individual steps and manages context
- Storage - saves step results and analysis metadata
Data Acquisition Strategy
The system supports two complementary approaches to obtaining data for analysis:
1. Preloaded Data (Current Priority)
// Initialize analysis with preloaded data
const analysisId = await analysisService.initializeAnalysis(
portfolioAnalysisConfig,
{
userId: 123,
wallet: walletData,
marketData: marketData,
// ... all necessary data
},
);
Advantages:
- Easier to implement and test
- Data is known in advance
- No DI dependencies during analysis
- More predictable behavior
Disadvantages:
- Cannot obtain data dependent on intermediate results
- Potential for loading excessive data
2. Dynamic Loading via DataSources (Future Development)
const marketAnalysisStep: AnalysisStepDefinition = {
id: 'market-analysis',
dependsOn: ['wallet-analysis'],
promptTemplate: `...`,
schema: marketAnalysisJsonSchema,
dataSources: [
{
serviceName: 'marketDataService',
methodName: 'getTechnicalData',
params: {
timeframe: '1d',
symbols:
'{{previousResults.wallet-analysis.extractedData.assetDistribution}}',
},
},
],
};
Advantages:
- Allows dynamically obtaining data based on results of previous steps
- More flexible and modular approach
- Steps can be more atomic and reusable
Disadvantages:
- Requires DI access at runtime
- More difficult to test
- Harder to debug
Implementation Plan
-
Phase 1: Preloaded Data
- Implement basic version with all data preloaded at the beginning
- Test orchestration of steps and dependencies between them
- Verify result storage and artifact creation
-
Phase 2: Hybrid Approach
- Keep preloading for simple cases
- Add dataSources support for steps requiring dynamic data
- Implement mechanism for substituting data from previous step results
-
Phase 3: Full Agent-like Capabilities
- Expand dataSources capabilities for more complex scenarios
- Add result caching for optimization
- Implement mechanism for rerunning steps when input data changes
Implementation Recommendations
- Keep dataSources types in the interface for future compatibility
- Start with simple steps without complex dependencies
- Gradually add more complex cases
- Thoroughly test the orchestration of the execution flow
Limitations and Potential Issues
- Cyclic dependencies are not supported
- Long-running operations should be handled with consideration for BullMQ timeouts
- Steps need to be idempotent for possible reexecution
- Step results must be serializable for storage
Conclusion
A hybrid approach with a gradual transition from preloaded to dynamically loaded data will provide a balance between quick implementation and system flexibility in the future. Starting with a simple model, we can iteratively improve the system, adding more complex capabilities as needed.