Apollo Analysis Implementation Strategy

Overview

Apollo Analysis is a system for multi-step data processing using LLMs, where each step can depend on the results of previous steps. The system uses a declarative description of steps and BullMQ for orchestration.

Key Components

Declarative description of analysis steps - pure JSON without functions
Orchestrator - uses FlowProducer to manage dependencies between steps
Analysis service - executes individual steps and manages context
Storage - saves step results and analysis metadata

Data Acquisition Strategy

The system supports two complementary approaches to obtaining data for analysis:

1. Preloaded Data (Current Priority)

// Initialize analysis with preloaded data
const analysisId = await analysisService.initializeAnalysis(
  portfolioAnalysisConfig,
  {
    userId: 123,
    wallet: walletData,
    marketData: marketData,
    // ... all necessary data
  },
);

Advantages:

Easier to implement and test
Data is known in advance
No DI dependencies during analysis
More predictable behavior

Disadvantages:

Cannot obtain data dependent on intermediate results
Potential for loading excessive data

2. Dynamic Loading via DataSources (Future Development)

const marketAnalysisStep: AnalysisStepDefinition = {
  id: 'market-analysis',
  dependsOn: ['wallet-analysis'],
  promptTemplate: `...`,
  schema: marketAnalysisJsonSchema,
  dataSources: [
    {
      serviceName: 'marketDataService',
      methodName: 'getTechnicalData',
      params: {
        timeframe: '1d',
        symbols:
          '{{previousResults.wallet-analysis.extractedData.assetDistribution}}',
      },
    },
  ],
};

Advantages:

Allows dynamically obtaining data based on results of previous steps
More flexible and modular approach
Steps can be more atomic and reusable

Disadvantages:

Requires DI access at runtime
More difficult to test
Harder to debug

Implementation Plan

Phase 1: Preloaded Data
- Implement basic version with all data preloaded at the beginning
- Test orchestration of steps and dependencies between them
- Verify result storage and artifact creation
Phase 2: Hybrid Approach
- Keep preloading for simple cases
- Add dataSources support for steps requiring dynamic data
- Implement mechanism for substituting data from previous step results
Phase 3: Full Agent-like Capabilities
- Expand dataSources capabilities for more complex scenarios
- Add result caching for optimization
- Implement mechanism for rerunning steps when input data changes

Implementation Recommendations

Keep dataSources types in the interface for future compatibility
Start with simple steps without complex dependencies
Gradually add more complex cases
Thoroughly test the orchestration of the execution flow

Limitations and Potential Issues

Cyclic dependencies are not supported
Long-running operations should be handled with consideration for BullMQ timeouts
Steps need to be idempotent for possible reexecution
Step results must be serializable for storage

Conclusion

A hybrid approach with a gradual transition from preloaded to dynamically loaded data will provide a balance between quick implementation and system flexibility in the future. Starting with a simple model, we can iteratively improve the system, adding more complex capabilities as needed.

Overview​

Key Components​

Data Acquisition Strategy​

1. Preloaded Data (Current Priority)​

2. Dynamic Loading via DataSources (Future Development)​

Implementation Plan​

Implementation Recommendations​

Limitations and Potential Issues​

Conclusion​