Skip to main content

Hecate Server Connectivity Postmortem Analysis

Β· 2 min read

Date: June 11, 2025 Incident: Hecate server (192.168.3.25) connectivity issues and cloudflared timeouts Root Cause: Improperly crimped RJ45 connector causing intermittent network disconnections

A detailed postmortem analysis of network connectivity issues that revealed the importance of proper physical layer infrastructure and led to recommendations for router upgrades and redundancy planning.

Background​

The hecate server experienced connectivity issues that initially appeared to be cloudflared-related timeouts. Investigation revealed the actual cause was a poorly crimped RJ45 connector that would disconnect when cables were moved during cleaning/maintenance.

Key Findings​

1. Network Infrastructure Issues​

  • Initial Problem: RJ45 connector was over-crimped, making it difficult to insert into standard ports
  • Symptom: Required force to insert, no proper click, easy disconnection when moved
  • Root Cause Discovered: USB Ethernet adapter limiting connection to 100Mbps instead of 1Gbps
  • Impact: Both intermittent connectivity and reduced network performance
  • Solution: PCIe x1 Gigabit Ethernet card (RTL8111C) to replace USB adapter
  • Available PCIe Slot: 00:1d.0 (Root Port #9) confirmed free and ready for installation

2. VPN Protocol Strategy for Russian Networks​

Current working protocols in Russia:

  • AmneziaWG: Primary choice, excellent bypass capabilities
  • Xray: Secondary option, good performance
  • Standard WireGuard: Blocked, not viable
  • Cloudflare tunnels: Also blocked

3. Router Infrastructure Requirements​

Goal: Centralized, professional dual-WAN setup with VPN support

Recommended Options:

  1. OpenWrt on Xiaomi routers (~$50-100)

    • Maximum customization
    • Can compile AmneziaWG support
    • Cost-effective for experimentation
    • Requires careful model selection (avoid non-flashable revisions)
  2. MikroTik hEX S (~$100)

    • Enterprise-grade RouterOS
    • Excellent dual-WAN support
    • SSH + WinBox management
    • Transferable skills
  3. Keenetic Ultra (~$150)

    • Stable, Russian-supported firmware
    • Built-in dual-WAN
    • Good for production use

4. Server Architecture Strategy​

Current: Single hecate server running Ollama + Jellyfin + GPU workloads Target: Distributed setup with backup capabilities

Backup Server Requirements:

  • High-end CPU for large models (Ryzen 9/Intel i9)
  • GPU with VRAM >= current hecate
  • 64GB+ RAM for CPU-based large models
  • Role: Ollama backup + experimental model testing

Action Items​

Immediate (Network Stability)​

  • Re-crimp all RJ45 connectors properly
  • Test cable integrity with network tester
  • Root cause identified: USB Ethernet adapter limiting speed to 100Mbps
  • Solution ordered: PCIe x1 RTL8111C Gigabit Ethernet card (arriving tomorrow)
  • Install PCIe Ethernet card in available slot (00:1d.0)
  • Configure new network interface and test gigabit speeds

Short-term (Infrastructure)​

  • Research and purchase OpenWrt-compatible router
  • Implement dual-WAN configuration
  • Set up centralized VPN management

Medium-term (Redundancy)​

  • Spec and build backup server
  • Implement Ollama load balancing/failover
  • Update Ansible configurations for new infrastructure

Long-term (Monitoring)​

  • Deploy comprehensive monitoring stack
  • Set up early warning systems for connectivity issues
  • Document all network topology and configurations

Technical Notes​

RJ45 Connector Best Practices​

  • Connectors compress during proper crimping to standard dimensions
  • Should insert easily but hold firmly with audible click
  • Test with cable tester after crimping
  • When in doubt, re-crimp rather than troubleshoot intermittent issues

VPN Protocol Considerations​

  • AmneziaWG not yet available in standard router firmware
  • May require separate VPS/server for AmneziaWG termination
  • Standard protocols (WireGuard, OpenVPN, IPSec) available on most routers

Lessons Learned​

  1. Physical layer issues can masquerade as application problems - Always check cables first
  2. Proper tooling and technique matter - Invest time in learning correct crimping procedures
  3. Redundancy is critical for trading systems - Single points of failure are unacceptable
  4. Network infrastructure should be professional-grade - Consumer equipment has limitations

Next Steps​

Focus on OpenWrt router solution for cost-effectiveness and maximum flexibility. This approach allows experimentation with custom VPN protocols while maintaining professional network management capabilities.