50. Ollama for Local AI Processing
Status: Accepted Date: 2025-07-06
Context
The Mercury system requires AI capabilities for tasks like market analysis, sentiment analysis, and strategy augmentation. Relying on external, cloud-based AI services (like OpenAI or Google Gemini) raises concerns about data privacy (sending sensitive trading data to third parties), latency (network round-trips), and cost. We need a solution that provides powerful AI models while keeping all processing local.
Decision
We will use Ollama as the engine for all local AI processing. Ollama allows us to run open-source Large Language Models (LLMs) and other AI models directly on our own infrastructure, inside our private network.
The mercury-ai module will be responsible for integrating with an Ollama server instance. This server will host the specific models we need for our tasks. This decision ensures that no sensitive trading data ever leaves our control.
Consequences
Positive:
- Data Privacy & Security: All data is processed locally, eliminating the risk of exposing sensitive financial information to external providers. This is the primary driver for this decision.
- Low Latency: By running models locally, we avoid network latency to external APIs, enabling faster analysis, which is critical in trading.
- Cost Control: We avoid the pay-per-use billing models of cloud AI services. The cost is fixed to the hardware we provision to run the Ollama server.
- Model Flexibility: We have the freedom to choose from a wide range of open-source models and fine-tune them for our specific needs without being locked into a specific vendor's offerings.
Negative:
- Infrastructure Overhead: We are responsible for provisioning, managing, and scaling the hardware (potentially including GPUs) required to run the Ollama server and the AI models.
- Model Management: We are responsible for downloading, managing, and updating the AI models themselves.
- Potentially Lower Performance: The performance of the models we can run locally might be lower than that of the largest, most powerful proprietary models available via cloud APIs.
Mitigation:
- Infrastructure as Code (IaC): The provisioning of the Ollama server will be managed via our Ansible playbooks (
adr://ansible-server-provisioning), making the setup repeatable and automated. - Targeted Model Selection: We will carefully select models that provide the best balance of performance and resource requirements for our specific tasks. We don't need a massive model if a smaller, fine-tuned one is sufficient.
- Continuous Evaluation: We will continuously evaluate the performance of our local models against our requirements and explore new models as they become available in the open-source community.