Skip to main content

Apollo Analysis Implementation Strategy

Overview

Apollo Analysis is a system for multi-step data processing using LLMs, where each step can depend on the results of previous steps. The system uses a declarative description of steps and BullMQ for orchestration.

Key Components

  1. Declarative description of analysis steps - pure JSON without functions
  2. Orchestrator - uses FlowProducer to manage dependencies between steps
  3. Analysis service - executes individual steps and manages context
  4. Storage - saves step results and analysis metadata

Data Acquisition Strategy

The system supports two complementary approaches to obtaining data for analysis:

1. Preloaded Data (Current Priority)

// Initialize analysis with preloaded data
const analysisId = await analysisService.initializeAnalysis(
portfolioAnalysisConfig,
{
userId: 123,
wallet: walletData,
marketData: marketData,
// ... all necessary data
},
);

Advantages:

  • Easier to implement and test
  • Data is known in advance
  • No DI dependencies during analysis
  • More predictable behavior

Disadvantages:

  • Cannot obtain data dependent on intermediate results
  • Potential for loading excessive data

2. Dynamic Loading via DataSources (Future Development)

const marketAnalysisStep: AnalysisStepDefinition = {
id: 'market-analysis',
dependsOn: ['wallet-analysis'],
promptTemplate: `...`,
schema: marketAnalysisJsonSchema,
dataSources: [
{
serviceName: 'marketDataService',
methodName: 'getTechnicalData',
params: {
timeframe: '1d',
symbols:
'{{previousResults.wallet-analysis.extractedData.assetDistribution}}',
},
},
],
};

Advantages:

  • Allows dynamically obtaining data based on results of previous steps
  • More flexible and modular approach
  • Steps can be more atomic and reusable

Disadvantages:

  • Requires DI access at runtime
  • More difficult to test
  • Harder to debug

Implementation Plan

  1. Phase 1: Preloaded Data

    • Implement basic version with all data preloaded at the beginning
    • Test orchestration of steps and dependencies between them
    • Verify result storage and artifact creation
  2. Phase 2: Hybrid Approach

    • Keep preloading for simple cases
    • Add dataSources support for steps requiring dynamic data
    • Implement mechanism for substituting data from previous step results
  3. Phase 3: Full Agent-like Capabilities

    • Expand dataSources capabilities for more complex scenarios
    • Add result caching for optimization
    • Implement mechanism for rerunning steps when input data changes

Implementation Recommendations

  • Keep dataSources types in the interface for future compatibility
  • Start with simple steps without complex dependencies
  • Gradually add more complex cases
  • Thoroughly test the orchestration of the execution flow

Limitations and Potential Issues

  • Cyclic dependencies are not supported
  • Long-running operations should be handled with consideration for BullMQ timeouts
  • Steps need to be idempotent for possible reexecution
  • Step results must be serializable for storage

Conclusion

A hybrid approach with a gradual transition from preloaded to dynamically loaded data will provide a balance between quick implementation and system flexibility in the future. Starting with a simple model, we can iteratively improve the system, adding more complex capabilities as needed.