Skip to main content

The Great Tiltfile Migration: A Comedy of Errors

· 3 min read

Or: How I Created a Museum for a Single File

The story of how a single Tiltfile ended up with its own dedicated directory structure, complete with documentation and a .gitkeep file. A masterclass in architectural over-engineering for the simplest possible use case.

top3-market-context-aware-impovements

· 2 min read

I'll scan the Mercury Kairos module to map the current position job lifecycle and queues, then propose top 3 approaches for market timing awareness.

[4 tools called]

I'll locate the Kairos queue names and job types, and read the position execution consumer to map the lifecycle. Then I'll propose the top 3 approaches for market timing awareness.

[4 tools called]

  • Checked Kairos queues and consumers to map the job lifecycle. Now proposing top 3 timing-aware approaches.

Top 3 market-timing approaches to add context awareness

  1. Session quality scoring and gating
  • What: Compute a rolling “session quality score” (Asian/Europe/US, hour-of-day), volume/volatility-weighted, with exchange uptime and spread/liquidity checks.
  • How to use:
    • Validation gate: block or defer VALIDATE_POSITION_CANDIDATE during poor sessions unless high-confidence signal.
    • Scheduling: slow SYNC cadence and TPSL recalcs in weak sessions; speed up in strong sessions.
  • Integration points:
    • DynamicTpslConsumer.handleValidatePositionCandidate(...): inject sessionScore into decision; require minimum score to proceed.
    • PositionSchedulerService.schedulePositionSyncJobs() and any candle-delay util: vary cadence by session score.
    • tpsl-helpers.validateTPSLDataSanity(...): enforce larger SL buffers when session score is low.
  1. Volatility regime controls (ATR/ADX bands) with risk throttling
  • What: Classify low/normal/high regimes (ATR percent-of-price, ADX, realized vol) per symbol-timeframe and throttle entries/risk.
  • How to use:
    • Validation: disallow entries in extreme regimes unless strategy explicitly supports them.
    • TPSL shaping: widen SL/bring TPs closer in high-vol; cap reward/risk dynamically.
  • Integration points:
    • TaService to compute ATR/ADX; pass regime into DynamicTPSLService and validation output.
    • tpsl-helpers.fixExcessiveRewardRiskRatio(...): make regime-aware caps/floors.
    • PositionExecutionConsumer.handleFinalizePosition(...): guard finalize/open if regime out-of-policy.
  1. Event risk windows and micro-liquidity guardrails
  • What: Maintain a near-term “risk calendar” (economic events, exchange maintenance) plus live micro-liquidity checks (spread, order book depth).
  • How to use:
    • Validation: block/defer near high-impact events; allow only shadow positions; or reduce size.
    • Scheduler: introduce “freeze windows” where only SYNC runs, no new creates/finalizes.
  • Integration points:
    • Extend validation output with eventRiskLevel and liquidityOk.
    • PositionFactory/finalize gate: enforce freeze windows for LIVE opens.
    • PositionSchedulerService: schedule deferrals until window passes.

Current position lifecycle (jobs perspective)

  • Queues

    • KAIROS.dynamic-tpsl
      • VALIDATE_POSITION_CANDIDATE: AI validation (pre-entry)
      • CALCULATE_DYNAMIC_TPSL_INIT: initial TP/SL calc for creation
      • CALCULATE_DYNAMIC_TPSL_PERIODIC: recalcs for open positions
    • KAIROS.position-execution
      • CREATE_POSITION: creates DB entity and opens LIVE position on exchange (stores exchangeOrderLinkId), emits event
      • FINALIZE_POSITION: finalizes with TPSL and opens LIVE if needed, emits event
      • SYNC_POSITION: core sync for TP/SL checks, verification, pricing updates
      • SCHEDULE_POSITIONS: orchestrates scheduling of SYNC_POSITION jobs
  • Primary flow

    1. Candidate → VALIDATE_POSITION_CANDIDATE (gate: market session/regime/event recommended)
    2. If valid → CALCULATE_DYNAMIC_TPSL_INIT
    3. CREATE_POSITION (or FINALIZE_POSITION depending on flow) opens LIVE on exchange and emits event
    4. SCHEDULE_POSITIONS periodically enqueues SYNC_POSITION per open position
    5. Optional: CALCULATE_DYNAMIC_TPSL_PERIODIC to adapt levels
  • Key guard points for timing/context

    • Validation step: accept/deny/defer with reason and next-candle delay
    • Finalize/open: last-mile gating before touching exchange
    • Scheduler: cadence control and freeze windows

Minimal, high-impact implementation plan

  • Add MarketContextService (session score, regime, event risk, liquidity) consumed by:

    • DynamicTpslConsumer.handleValidatePositionCandidate(...)
    • PositionSchedulerService.schedulePositionSyncJobs(...)
    • PositionExecutionConsumer.handleFinalizePosition(...)
  • Extend validation result/AI decision context with sessionScore, volRegime, eventRiskLevel.

  • Make tpsl-helpers regime-aware: dynamic R/R caps, SL buffers, TP spacing.

  • Adjust scheduler cadence and allow deferrals based on session score and event windows.

  • In short: gate earlier (validation), shape risk (TPSL), and schedule smarter (scheduler).

  • Summary: Proposed 3 timing-aware layers (session scoring, volatility regimes, event/liquidity guardrails) and mapped where to inject them across validation, finalize/open, and scheduling for a robust market context pipeline.

Katana vs Built‑in TPSL – Toward a Fair Objective

· 2 min read
Max Kaido
Architect

Why “better” needs a clear objective

When comparing TPSL methods (built‑in heuristic vs Katana combinatorics), plain weighted R:R can reward far TPs and penalize robust SLs. We need an objective that reflects utility under uncertainty, not just geometry.

Observed asymmetries

  • Built‑in can nudge SL continuously (e.g., to meet R:R≥1.5), while Katana currently samples discrete SL candidates (swing/ATR/heuristic). This can yield higher R:R on built‑in without necessarily overfitting.
  • Allocation grid: 50/50 sometimes wins; Katana limits TP1 to 30–50%. If RR2 uplift is modest, shifting to 30/70 can hurt the weighted result.
  • Precision/rounding: Minor differences move RR.

Example (MYRIAUSDT): built‑in with SL 0.00093309 (50/50) beats Katana using SL 0.00091865 (30/70). The difference stems from SL granularity and allocation grid.

A fair, robust objective

Replace naive weighted R:R with risk‑aware utility variants:

  • Harmonic mean: penalizes imbalance across TPs.
  • Concave utility: utility = Σ alloc_i · sqrt(RR_i).
  • Capped RR: utility = Σ alloc_i · min(RR_i, RR_cap) to avoid chasing distant TP2.

Penalties (configurable):

  • Risk cost: − α · riskPercent (discourage very wide SLs).
  • Distance cost: − β · max(0, tp2DistancePct − k) (penalize far, low‑probability targets).

Candidate and allocation policy

  • SL candidates: swing lows, ATR 1.5×, heuristic; optional “RR‑tuned within band” (off by default) to keep parity with built‑in continuous nudge.
  • TP candidates: ATR 1.5×, ATR 2.5×, H1 VWAP; add D1 swing if available.
  • Allocations: TP1 ∈ [30%, 60%] step 5; 2‑ or 3‑level with min 10% on TP3.
  • Constraints: no‑trade ±1%; SL below entry and inside ATR band; ordered TP levels.

Selection and stability

  • Stability filter: prefer combos that stay top‑quartile across rolling windows.
  • Tie‑breakers: lower riskPercent, then semantic SL (swing > ATR) for explainability.

Implementation plan (incremental)

  1. Add utility variants to Katana (harmonic, concave, capped).
  2. Optional RR‑tuned SL candidate (disabled by default) and exact rounding parity.
  3. Expand allocation sweep to 30–60% for TP1.
  4. Report columns: builtin_utility, katana_utility_[variant], risk%, flags (rr_tuned_used).
  5. Run across full universe; compare win‑rates per variant and by regime.

Expected outcome

We trade single‑run optimality for robustness: fewer fragile wins, more consistent, explainable decisions that respect risk. If built‑in still “wins,” it will do so under the same objective—then we adopt that behavior into Katana transparently.

min-trading-engine-improvements

· 2 min read

Problem statement (original bug scope)

  • Realized PnL reported zero for closed positions despite observed TP hits.
  • Exit reasons were misclassified (e.g., MANUAL) when TPs or SLs clearly applied.
  • Shadow simulator used inverted trigger rules for LONG/SHORT TP/SL, producing false signals.
  • TP reconciliation re‑created already filled levels, causing repeated “TP filled” cycles and skewed outcomes.
  • Tournament/dashboard PnL inherited these upstream inaccuracies.

Fixes implemented (correctness first)

  • Fixed shadow trigger polarity: LONG TP when high ≥ price; LONG SL when low ≤ price (inverse for SHORT).
  • Reliable final PnL and exitReason for CLOSED: use exitPrice or weighted TP fills; added COMBINED.
  • History/tolerance‑aware reconciliation: skip creating TPs whose price matches a filled level.

Minimal evolutionary improvements (no heavy refactor)

  • Idempotent reconciliation (no migrations): target − executed (history‑aware) to decide orders; price tolerance to avoid duplicates.
  • Guard rails in code: if remainingQty === 0 → no new orders; if a TP is marked filled → do not recreate at same price.
  • Strong invariants in tests: trigger polarity; “no TP recreation if filled”; “CLOSED ⇒ unrealized = 0; exitReason consistent”.
  • Telemetry to prevent regressions: counters for TP re‑creation attempts; CLOSED with unrealized ≠ 0.

Optional next step (still lightweight)

  • Make shadow simulator side‑effect‑free: emit candidate events; a single validator applies them.

Code examples (sketches)

Idempotent reconciliation (pure diff + tolerance)

export interface OrderIntent { kind: 'TP' | 'SL'; price: number; qty: number; idKey: string }

export function computeDesiredOrders(
target: { tps: Array<{ price: number; percentage: number }>; sl?: number },
executed: { filledTps: Array<{ price: number }>; slFilled: boolean },
remainingQty: number,
tolerance = 0.001,
): OrderIntent[] {
const near = (a: number, b: number) => Math.abs(a - b) / (b || 1) < tolerance;
const openTps = target.tps.filter(tp => !executed.filledTps.some(f => near(f.price, tp.price)));
const intents: OrderIntent[] = openTps.map(tp => ({
kind: 'TP',
price: tp.price,
qty: (remainingQty * tp.percentage) / 100,
idKey: `tp:${tp.price.toFixed(6)}`,
}));
if (target.sl && !executed.slFilled) {
intents.push({ kind: 'SL', price: target.sl, qty: remainingQty, idKey: `sl:${target.sl.toFixed(6)}` });
}
return intents;
}

Finite state guard (no illegal transitions)

export enum PositionFsmState { OPEN = 'OPEN', PARTIAL = 'PARTIAL', CLOSED = 'CLOSED' }

export function deriveState(position: Position): PositionFsmState {
if (position.remainingQty === 0) return PositionFsmState.CLOSED;
if (position.remainingQty < position.baseQty) return PositionFsmState.PARTIAL;
return PositionFsmState.OPEN;
}

export function assertNoIllegalTransition(before: PositionFsmState, after: PositionFsmState) {
if (before === PositionFsmState.CLOSED && after !== PositionFsmState.CLOSED) {
throw new Error('Illegal transition: CLOSED cannot reopen');
}
}

Reliable final PnL and exitReason (CLOSED)

export function determineExitReason(p: Position): 'take_profit' | 'stop_loss' | 'combined' | 'manual' {
const anyTp = (p.tpsl?.active?.takeProfits || []).some(tp => tp.lastStatus === 'filled') || (p.tpsl?.history?.filledTPs || []).length > 0;
const sl = p.tpsl?.active?.stopLoss?.lastStatus === 'filled';
if (sl && anyTp) return 'combined';
if (sl) return 'stop_loss';
if (anyTp && p.remainingQty === 0) return 'take_profit';
return 'manual';
}

export function weightedExitPrice(p: Position): number | undefined {
const filled = (p.tpsl?.active?.takeProfits || []).filter(tp => tp.lastStatus === 'filled' && typeof tp.filledPrice === 'number');
if (filled.length === 0) return p.exitPrice;
const totalPct = filled.reduce((s, tp) => s + tp.percentage, 0);
if (totalPct <= 0) return p.exitPrice;
return filled.reduce((s, tp) => s + (tp.filledPrice as number) * tp.percentage, 0) / totalPct;
}

Shadow simulator without side effects (optional)

export async function simulateShadow(position: Position, market: MarketData) {
const events = [] as Array<{ type: 'TakeProfitFilled' | 'StopLossHit'; tpIndex?: number; price: number; at: Date; idKey: string }>;
// Emit only; apply elsewhere
return events;
}

fdd-methodology-analysis-and-improvement-ideas

· 3 min read

I'll load the FDD-related rules and try a couple of FDD commands to ground the discussion. Ran tool

cd /home/dev/kaido-monorepo && pnpm kaido fdd:compile . | cat
cd /home/dev/kaido-monorepo && pnpm kaido fdd:find-missing | cat

Kaido Protocol is a disciplined, rule-driven workflow aligning FDD, Cursor rules, and CLI validation to maintain synchronized documentation, safe refactors, and reliable deployments through mandatory mantras, strict validation gates, and automated, hierarchical context recovery. Agents scan systematically, prioritize ruthlessly, document essentials, validate.

Mantras:

  • "Module root, why not how, sync code with docs."
  • "Code in TypeScript, deploy to activate, validate to trust."
  • "Scan systematically, prioritize ruthlessly, document everything."
  • "One thousand tokens max, trim fat, keep essence."
  • "Black box thinking, what not how, survive any refactor."
  • "Delegate down, abstract up, never delete."
  • "Evolve systematically, preserve functionality, validate continuously."
  • "Extend wisely, rename clearly, create consciously."
  • "Validate before deploy, zero tolerance for broken rules."
  • "Extract public contract, ignore implementation, honor what exists."

I compiled the FDD hierarchy and scanned for missing FDDs. Sharing zero-shot weaknesses from an agent perspective.

Zero-shot weaknesses that can limit effectiveness

  • • Coverage gaps
    • Many modules lack FDD.md, breaking hierarchical context loading and forcing guesswork.
  • • Manual enforcement reliance
    • Validation steps are easy to skip locally; without CI gates, drift accumulates between code and FDD.
  • • FDD explosion risk
    • Per-submodule FDDs (e.g., many dike config variants) can create high overhead and noisy maintenance; navigation degrades.
  • • Abstraction drift
    • FDDs can slide into implementation detail or become too vague; both reduce refactor-safety and agent utility.
  • • Token-limit pressure
    • 1K-token cap can force oversimplification unless delegation links are well-curated; otherwise, critical nuances disappear.
  • • Rule deployment pitfalls
    • Editing .mdc or forgetting to deploy/validate Cursor rules leaves rules inactive; system silently weakens.
  • • Waypoint (.todo.md) desync
    • Michishirube guidance isn’t validated; these can drift from FDDs and code, confusing mid-stream agents.
  • • Fragmented ID/link hygiene
    • Inconsistent fddId, versioned directory names (e.g., v1, v3) and refactors can orphan fdd:// links if not constantly maintained.
  • • Process friction
    • Multiple protocols and chants create cognitive load; without automation, contributors bypass the process under time pressure.
  • • Missing intent-to-type alignment
    • Central types (@kaido/types) can diverge from FDD integration contracts; no automated contract check ties them together.
  • • Discovery noise
    • Heuristics flag many small or empty dirs; if signal isn’t prioritized/triaged, teams ignore the tool output.
  • • Evolution underuse
    • Refactors may proceed without Evolution FDD scaffolding, losing rollback/contract guarantees and causing integration breakage.

Quick repo signals informing the above

  • Only root FDD detected: kaido-monorepo-root compiled successfully.
  • 43 missing FDD locations, mostly under domains/mercury/backend/src/shared/modules/... and dike/... configs; many marked High Priority.

Mitigations to consider

  • • CI gates: run pnpm kaido fdd:validate for changed paths; fail PRs on invalid or missing FDDs touching those modules; forbid .mdc edits.
  • • Autostubs: a command to generate minimal FDD.md skeletons for all “High Priority” findings to cut friction.
  • • Aggregation strategy: prefer one FDD for a group (e.g., dike/configs) with child links only where needed, to avoid proliferation.
  • • Drift checks: a code-to-FDD linter that diff-checks exported APIs vs FDD contracts; warn when contracts change without FDD updates.
  • • Link hygiene bot: validate and auto-rewrite fdd:// links on moves/renames; enforce stable fddId naming.
  • • Mandatory evolution for risky refactors: template + PR checklist; require rollback and integration contracts.
  • • Waypoint validation: require co-located *.todo.md updates when FDD changes; light lint to prevent stale guidance.
  • • Fast-path tooling: pnpm kaido fdd:fix-missing --high to stub and link, plus a coverage badge to gamify progress.

Your turn—keen to hear your observations so we can focus on the most painful gaps first.

  • Ran pnpm kaido fdd:compile . to confirm only kaido-monorepo-root exists. - Ran pnpm kaido fdd:find-missing and found 43 missing FDDs (33 high priority), mainly in domains/mercury shared modules and dike configs. - Provided a concise risk map and targeted mitigations (CI gates, autostubs, aggregation, drift checks).

top3-min-trading-engine-testing-strategy

· 2 min read

Top 3 strategies (realism-first, no sugar)

  • Testnet-in-the-loop E2E (nightly)

    • Use Bybit testnet + real Postgres/Redis, no HTTP mocks.
    • Drive minimal-trading-system through scenarios: TP-only, SL-only, partial fills, cancel/recreate, trailing SL, and close-by-TP vs close-by-SL.
    • Assert via public contracts:
      • GET /positions/:id/pnl realized/unrealized correctness
      • tournament PnL stats equality to position-level averages
      • entity state: remainingQty, status, exitTime, exitPrice (derived when TP-closed)
    • Gate on metrics: realized/unrealized gauges non-zero for filled scenarios; no “stuck OPEN with remainingQty=0”.
  • Recorded-trace replay harness (PR/CI fast lane)

    • Record real Bybit responses (orders, executions, position info, tick prices) from testnet/live read-only.
    • Replay deterministically through the engine with time-freeze; no network, real DB.
    • Add property-based fuzz on price paths and fill ordering; validate invariants:
      • CLOSED → unrealized=0; combined = realized
      • After each filled TP: remainingQty and realized move by percentage-weighted amounts
      • Weighted-exit derivation equals fills when exitPrice absent
      • Tournament/type PnL equals average of member positions
    • Fail on divergence between replay and original trace decisions.
  • Shadow/live parallel verification (canary)

    • Run engine shadowing live market (no order placement), consume real prices; compute TPSL decisions and hypothetical fills.
    • Compare shadow decisions vs actual exchange outcomes (from real positions or historical candles with slippage rules).
    • Export invariants to Prometheus; alert on breaches (e.g., closed-without-exitPrice, zero realized after TP).

Key invariants to assert everywhere

  • Closure: remainingQty=0 ⇒ status=CLOSED, exitTime set, unrealized=0.
  • TP fills: sum(perc of filled TPs) drives remainingQty; realized USD% = weighted by filled sizes.
  • Exit price: if TP-closed and exitPrice missing ⇒ derived weighted average of filled prices.
  • Combined PnL: realized + unrealized equals combined within epsilon.
  • Aggregation: tournament/type realized/unrealized = average of member position values used.
  • No noise: no history entries on mere recalculation; only on real fills.

Practical tips

  • Use real DBs; freeze time; seed RNG; only mock external payments.
  • Tiered pipeline: Replay (CI, fast) → Testnet-in-loop (nightly) → Shadow canary (continuous).
  • Persist failing traces as fixtures to prevent regressions.

Perfect tournament helper guide — consistency blueprint for AI

· 3 min read
Max Kaido
Architect

Problem statement

Tournament helpers across strategies drifted in naming, fields, and gating logic, which complicates maintenance and AI authoring. We need a single, consistent blueprint that AI can follow to produce “ideal” helpers: normalized indicators, uniform scoring in [0–1], clear validation gates with reasons/metadata, telemetry for dashboards, and compatibility with consumer-side leniency.

Inconsistencies observed

  • Indicator proxies: some helpers use ema_21 for EMA20 and ema_55 for EMA50; others use exact. Standardize via a mapping layer.
  • Volume fields: some compute volumeUsd = volume*price, others reference volume.volumeUsd implicitly. Use analysis.volume.volumeUsd everywhere.
  • Validation styles: flat required/optional vs state-machine (primed→confirmed) vs rescue logic; unify gating semantics.
  • Risk guards: SL-distance/liquidity floors exist in some, absent in others; define universal hard no-gos.
  • Scoring: component weights/normalizations vary; ensure 0–1 ramps with weights summing to 1, plus bounded bonuses.
  • Naming drift: isBullishAlignment vs hasEmaAlignment; hasSqueeze vs isSqueeze; pick one convention.
  • Reasons/metadata: different phrasing and missing numeric context; standardize reasons.passed/failed/warnings plus numeric metadata.
  • Timeframe usage: H4/H1/D1 presence inconsistent; ensure helpers declare and use configured timeframes.

Guide for AI: “Ideal tournament helper” blueprint

  • Core contract

    • transform(multiTimeframeAnalysis, symbol) → Analysis
    • validate(Analysis) → { action: 'VALID'|'INVALID', confidence: number, reasons: {passed[], failed[], warnings[]}, metadata: Record<string, number> }
    • generateTPSL(Analysis, entryPrice) → { entryPrice, stopLossPrice, takeProfitLevels[], reasoning, metadata }
    • createValidationPrompt(Analysis, entryPrice) → string
    • createComparisonPrompt(AnalysisA, AnalysisB, entryA, entryB) → string
    • Analysis must include:
      • context: { symbol, currentPrice, analysisTimestamp }
      • volume: { currentVolume, volumeSma, volumeUsd, volumeRatio }
      • trend: { adx, plusDi, minusDi, isBullishBias | isBearishBias }
      • momentum: rsi, macd (values + slopes as needed)
      • ema: { ema20, ema50, isBullishAlignment | isBearishAlignment }
      • vwap/cvd/swing/volatility as needed by the strategy
      • states: optional flags for gating (e.g., isSqueeze, isPrimed, isConfirmed)
      • scoring: components + totalScore (0–1)
  • Indicator normalization

    • Provide a mapping layer to normalize series keys:
      • ema20 = values['ema_20'] || values['ema_21']
      • ema50 = values['ema_50'] || values['ema_55']
    • Expose booleans consistently: isBullishAlignment / isBearishAlignment.
  • Scoring specification

    • totalScore in [0, 1], computed as Σ weight_i * feature_i
    • Features normalized with monotonic ramps:
      • ramp(x, from, to) for linear segments; clamp to [0, 1].
      • example: ADX 20→40, RSI 50→70, MACD histogram thresholds, volume ratio 0.5→2.0.
    • Optional small, bounded bonuses/penalties (e.g., +0.03 per optional confirmation; cap totalScore at 1.0).
    • Include component scores in scoring: trendScore, momentumScore, volumeScore, structure/flow scores, optionalBonus, totalScore.
  • Validation policy

    • Hard no-gos (never relaxed):
      • stablecoin, missing data, absurdly low liquidity (min volumeUsd), invalid SL band, extreme risk (e.g., SL > 10%).
    • Gating structure (consistent across tournaments):
      • core gates: AND/OR of primary conditions (e.g., (trend OR momentum) AND volume AND structure)
      • optional confirmations: contribute bonus and “rescue” logic
      • state machine (if needed): define isPrimed then isConfirmed flags from Analysis.states
    • Output:
      • action based primarily on totalScore thresholds with warnings around borders
      • reasons.passed/failed/warnings in consistent language
      • metadata with numeric component qualities
  • Telemetry support

    • Emit in Analysis a gates block with values and thresholds used, e.g.:
      • gates: { volumeUsd: { value, min }, adx: { value, min }, bbWidthPct: { value, max }, slRiskPct: { value, max }, diGap: { value, min }, … }
    • Keep names stable for dashboards and leniency decisions.
  • Buy/Sell symmetry

    • Mirror logic for SELL: invert EMAs, DI bias, MACD/RSI thresholds and breakout directions; keep same field names (isBearishAlignment, etc.).
  • Prompt helpers

    • Always include numeric context in prompts (RSI, MACD, ADX, volume ratio, SL%, R:R) for reproducibility.
    • Use the same comparison schema across tournaments where possible; tailor only the feature list.
  • Null/data safety

    • Default missing numeric inputs to 0, but record failures in reasons; never throw during transform for non-critical gaps.
    • Validate presence of required timeframes per config.
  • Leniency compatibility

    • Ensure Analysis provides:
      • context.currentPrice
      • volume.volumeUsd (or components so the consumer can compute it)
      • scoring.totalScore
      • optional states (e.g., isPrimed) to make leniency meaningful for stateful strategies.
    • Consumer handles leniency; helpers just expose consistent fields.
  • Naming conventions

    • Booleans start with is/has
    • Scores end with Score; totals as totalScore
    • CamelCase keys; units implicit unless ambiguous (use pct for percentages).
  • Minimal skeleton example

export function transformForX(mta, symbol): XAnalysis {
// normalize indicators + compute features
return {
context: { symbol, currentPrice, analysisTimestamp: new Date() },
volume: { currentVolume, volumeSma, volumeUsd, volumeRatio },
trend: { adx, plusDi, minusDi, isBullishBias },
momentum: { rsi: { value, slope }, macd: { histogram, slope } },
ema: { ema20, ema50, isBullishAlignment },
states: { isSqueeze, isPrimed, isConfirmed },
scoring: { trendScore, momentumScore, volumeScore, optionalBonus, totalScore },
gates: { volumeUsd: { value: volumeUsd, min: 1_000_000 }, adx: { value: adx, min: 25 } }
};
}

export function validateForX(analysis: XAnalysis) {
// hard no-gos → INVALID
// core gates + scoring thresholds → VALID/INVALID + warnings
return { action, confidence, reasons, metadata };
}
  • Acceptance checklist per helper
    • Provides analysis.scoring.totalScore in [0, 1]
    • Provides context.currentPrice and volume.volumeUsd
    • Uses consistent indicator normalization
    • Emits gates info for telemetry
    • Returns validation with reasons and numeric metadata
    • Mirrors buy/sell correctly

This guide yields consistent, leniency-compatible helpers that the consumer can use uniformly, while preserving each tournament’s unique logic through standardized analysis, scoring, and gating.

Quick path to 16 tournaments — leniency and backfill without heavy refactors

· 4 min read
Max Kaido
Architect

Problem statement

We need to launch a full slate of ~16 tournaments daily quickly, even if individual strategies aren’t perfect yet. Candidate scarcity causes portfolio imbalance. This doc proposes a low-complexity approach—two-pass leniency and near‑miss backfill—implemented via small config knobs and generic consumer logic, so we can ship now and refine over time.

Quick path to “good-enough” 16 tournaments

  • Goal: increase candidate supply fast without heavy refactors.
  • Approach: add a two-pass leniency and near‑miss backfill, driven by simple config knobs. Keep “hard no-gos” intact. Pairwise LLM round will clean up remaining noise.

Minimal feature set (1–2 days, low complexity)

  • Two-pass validation
    • Pass 1: current strict rules.
    • Pass 2 (only if below targetCount): re-run with “leniency multipliers” applied to thresholds.
  • Near-miss backfill
    • If still below targetCount, include highest-score near-misses failing by the smallest normalized margin.
  • Score floor
    • Accept on Pass 2/backfill only if totalScore ≥ scoreFloor (e.g., 0.25–0.30).
  • Per-tournament leniency profile
    • profiles: default | loose | aggressive; encoded as multipliers for specific gates.
  • Hard no-gos never relaxed
    • Stablecoins, blacklists, obviously broken data, absurdly low liquidity.
  • Tiny telemetry
    • Count per-gate failures and average “distance-to-threshold”. No dashboards yet; log/CSV is enough.

Example config fields to add

validation: {
targetCandidatesAfterValidation: 160, // ensures supply for top 56
scoreFloorOnLeniency: 0.28,
twoPassEnabled: true,
leniencyProfile: 'loose',
relaxations: { // multipliers applied on Pass 2
volumeRatioMin: 0.85,
adxMin: 0.85,
bbWidthPctMax: 1.25,
diGapTolerance: 1.5, // allows more negative DI gap
slRiskGuardMax: 1.4, // allows larger SL distance
}
}

Where to wire it

  • In your validator helper for each tournament: accept an optional “leniency” object and apply multipliers to thresholds.
  • In processScoreMarketsBatch: after Pass 1, if validCount < target, run Pass 2 with leniency. If still low, backfill near-misses meeting scoreFloor.

Tournament-specific quick levers

  • Momentum Strength Buy (MSB)

    • Volume ratio floor: 1.2 → 1.0 on Pass 2.
    • ADX floor: 25 → 20 on Pass 2.
    • Enable rescue path by default; lower volumeTrend rescue from 0.5 → 0.35.
    • Score floor on Pass 2/backfill: 0.28.
    • Keep EMA alignment required (do not drop structure entirely).
  • Volatility Breakout Buy (VBB)

    • BB width percentile: 20% → 30–35% on Pass 2.
    • Volume ratio: 1.2 → 1.0 on Pass 2.
    • DI gap tolerance: allow −10 instead of −3 on Pass 2; still require no strong bearish bias.
    • Risk guard (SL distance): 5% → 7% max on Pass 2.
    • Volume USD floor: 1.0M → 0.5M on Pass 2.
    • Allow “primed” without full “confirmed” for prefilter inclusion if score ≥ 0.35.

Near-miss backfill (simple rule)

  • Compute normalized distance for each failing gate and take min across gates.
  • Build a “nearMissScore” = −minNormalizedDistance; sort descending.
  • Add near-miss markets (meeting scoreFloor) until targetCandidatesAfterValidation is reached.

Safe bounds

  • Cap Pass 2 increase to 2–3× Pass 1 supply to avoid flooding.
  • Enforce global min liquidity and data-quality checks.
  • Keep a conservative final score floor (≥0.25) for any leniency-induced candidate.

Why this works now

  • Your LLM bracket (step 4–5) is robust and benefits from more diverse, reasonably good candidates.
  • Two-pass + backfill gives you control to always feed enough to reach top 56 without significant engineering.

Phased rollout

  • Day 1: add leniency knobs + two-pass + backfill + tiny telemetry.
  • Day 2: tune MSB/VBB multipliers, set per-tournament targets, watch logs for gate bottlenecks.
  • Week 1: promote more tournaments to this pattern; set consistent targets to reach 16 daily.

Summary

  • Add two-pass leniency + near-miss backfill with a score floor; no heavy DSL needed.
  • Loosen specific MSB/VBB gates as above to quickly lift supply.
  • Keep hard no-gos; rely on LLM rounds to refine quality.
  • Minimal telemetry (counts + distances) is enough to iterate quickly toward 16 tournaments.

Short answer

  • Stick with custom helpers now. Don’t force MSB/VBB into a fully config-only validator/scorer yet.

What I’m recommending

  • Keep the bespoke helper methods for MSB and VBB as the source of truth.
  • Add light, per-tournament config knobs that the helpers read:
    • leniency multipliers (two-pass)
    • target candidate count after validation
    • score floor for lenient/backfill passes
    • near-miss backfill toggle
  • This gives you quick tuning to reach 16 tournaments without heavy refactors.

Later (when you have time)

  • Gradually extend the helper DSL to express gates/state machines and conditional thresholds.
  • Migrate specific pieces of logic to config once expressible, keep escape hatches for bespoke cases.

Summary

  • Use custom helpers + small config levers now to ship 16 “good-enough” tournaments fast.
  • Evolve toward more config-driven definitions incrementally, not as a prerequisite.

Example of how to implement leniency on MSB v3

  • Implemented a generic leniency section on TournamentConfig, applied it to MSB v3 config, and updated the consumer to use these fields without checking tournament type.

What changed

  • libs/core/kaido-types/src/dike/tournament.types.ts
    • Added TournamentLeniencyConfig and optional leniency?: TournamentLeniencyConfig to TournamentConfig.
  • domains/mercury/backend/src/shared/modules/dike/tournaments/configs/momentum-strength-buy/v3/momentum-strength-buy-v3.ts
    • Added:
      • leniency.enabled: true
      • leniency.scoreFloor: 0.28
      • leniency.minVolumeUsd: 500_000
      • leniency.preferValidOverLenient: true
  • domains/mercury/backend/src/shared/modules/dike/tournament.consumer.ts
    • In scoring batch:
      • If strict validation fails, use tournamentConfig.leniency to accept as LENIENT when score >= scoreFloor and volumeUsd >= minVolumeUsd.
      • Mark entries via breakdown.validity = 'VALID'|'LENIENT'.
    • In preselection:
      • If leniency.preferValidOverLenient is true, pick VALID first then LENIENT by score; else purely by score.

How to extend to other tournaments

  • Add the leniency block to a tournament config with desired thresholds. The consumer will automatically:
    • Accept borderline candidates as LENIENT under your score/liquidity floor.
    • Prefer VALID over LENIENT if configured.
    • Still feed only the top 56 into pairwise LLM.

This keeps helpers bespoke, uses small config knobs for tuning, and lets you enable the same approach on any tournament by editing its config only.

Tournament helpers — configurability, strictness telemetry, and dynamic thresholds

· 4 min read
Max Kaido
Architect

Problem statement

We were asked to:

  • Check helpers in dike/tournaments/ for crafting validation formulas and scoring so they can be defined per tournament via config.
  • Review the latest versions of Momentum Strength Buy and Volatility Breakout Buy.
  • Answer three questions:
    1. Can their custom validation and scoring be implemented via helpers/configs, or are they too custom for universalization?
    2. For tournaments with too few candidates, which validation parts are overly strict, and how should this be surfaced (dashboard/report) for quick tuning?
    3. Outline a path to a dynamic system that adjusts formulas based on prior scoring to reach a desired candidate count.

Answers at a glance

    1. Short answer: partially yes. Both tournaments can mostly be expressed via helpers/configs if we extend the validation/scoring helpers to support gating logic (anyOf/OR), sequential “state machine” gates, conditional thresholds, and bonuses/penalties. With the current helper primitives (simple required/optional lists) they are too custom to fully replace bespoke methods.
    1. Represent strictness via a “validation funnel” + per-gate telemetry: log which gate fails and by how much, aggregate into a daily funnel, show distributions near thresholds, and provide “what‑if” threshold curves. A simple JSON/DB + Grafana or a markdown/CSV report is enough to quickly tune formulas.
    1. Dynamic system: yes—use quantile-based threshold control (or a simple PID-like controller) to hit a target candidate count subject to score-quality guardrails. Start with offline suggestions, then enable adaptive adjustments with smoothing and bounds.

Evidence of custom logic that exceeds today’s helpers

  • Momentum Strength Buy has conditional “rescue” logic that relaxes volume if enough optional confirmations exist:
if (!isValidLocal && hasOptionalConfirmations) {
const volBreakLoose = volumeAnalysis.volumeTrend > 0.5;
isValidLocal = hasTrendOrMomentum && volBreakLoose && hasEmaAlignment;
}
  • Volatility Breakout Buy uses a two-step state machine (primed→confirmed) and risk/volume guards:
const isPrimed = isSqueeze && volumeRatio >= 1.2 && bbWidthPct < 20 && hasMinVolume;
const isConfirmed = isPrimed && priceDistanceFromUpperBb >= 0.005;
const currentPrice = analysis.context.currentPrice;
const slDistance = (currentPrice - slBand.minSl) / currentPrice;
const riskGuard = slDistance <= 0.05;

These patterns need expressions, sequential gating, and metric-aware guards—beyond a flat required/optional list.

1) Can helpers replace custom methods?

  • Momentum Strength Buy
    • Feasible with extensions: core gates (trend/momentum AND volume AND EMA) map to required. Rescue logic needs conditional thresholds (“if optional confirmations ≥ k, lower volume threshold”). Scoring already returns a continuous totalScore and is consumed from .scoring; can be modeled by a config-driven feature+weight DSL.
  • Volatility Breakout Buy
    • Feasible with extensions: requires a 2-step gate (primed then confirmed), plus DI-gap tolerance, volume-USD floor, and SL risk guard. All are expressible if the helper supports sequential gates with named expressions and threshold parameters.

Conclusion: Don’t abandon helpers—extend them. Without extensions, these v3 implementations are too custom. With a small DSL upgrade, both fit.

Recommended helper extensions:

  • Validation DSL
    • Gates pipeline with steps: allOf/anyOf/not, group weights, and “if A then relax B by Δ”.
    • Named metrics from analysis paths; each gate emits pass/fail, value, threshold.
    • Soft validation mode: allow pass-by-score with warnings below a score floor.
  • Scoring DSL
    • Features = path + normalizer (ramp, sigmoid, min-max) + weight.
    • Bonuses/penalties and caps.
    • Total score formula defined in config; still returned as .scoring.

2) How to see what’s too strict (fast tuning)

Implement “validation funnel + telemetry”:

  • Instrument each validator to log a trace per market:
    • market, tournament, gateName, pass/fail, value, threshold, distance-to-threshold, and the final totalScore.
  • Aggregate to a daily report with:
    • Funnel counts: total → after each named gate (e.g., squeeze → volumeUSD → trendFailSafe → riskGuard).
    • Per-gate pass rate and median distance-to-threshold; show P5/P50/P95 near thresholds.
    • Top-k failure reasons and their contribution to drop-off.
    • What‑if curves: simulate moving a threshold ±X% and estimate candidates gained/lost using recorded distributions. Delivery options:
  • Quick: write JSON/CSV artifacts and generate a markdown report per run.
  • Better: persist to DB table tournament_validation_trace and make a Grafana/Metabase dashboard:
    • Panels: funnel, strictness heatmap by gate, threshold proximity histograms, candidate count over time, quality vs quantity scatter (score vs acceptance).

This lets you instantly see which gate is over-filtering and by how much, so you can edit config instead of code.

3) Toward a dynamic, self-adjusting system

  • Target: desiredCandidates per run or per day.
  • Controller per tournament:
    • For each tunable gate threshold T with monotonic effect, compute the metric’s empirical CDF from recent traces.
    • Set T to the quantile that yields the target acceptance for that stage (e.g., pass rate target after this gate).
    • Apply smoothing (EMA) and bounds; respect quality constraints (e.g., keep totalScore ≥ minScoreQuantile).
    • Re-evaluate periodically (e.g., daily) and write “suggested thresholds” to preview; once stable, enable auto-apply with guardrails.
  • Optional: add a global score floor that auto-adjusts to maintain final candidate count while preserving average quality.

Minimal initial schema additions:

  • In each tournament config:
    • validation.gates: array of steps with expressions, thresholds, relaxations, and enabled flags.
    • telemetry: enabled, samplingRate.
    • adaptive: enabled, desiredCandidates, smoothing, minScore, bounds per threshold.

What to do next (low-effort, high-impact)

  • Standardize validation traces in both helpers to emit gate-level telemetry.
  • Add a daily “Validation Funnel Report” job that prints a markdown table and “what‑if” suggestions.
  • Extend helpers to support anyOf and sequential gates; add named thresholds in config.
  • Phase 2: switch MSB/VBB validators to the new DSL; keep equivalence with current logic.
  • Phase 3: add quantile-based adaptive controller producing suggested thresholds; gate behind a flag.

Summary

  • Current helpers can almost cover MSB/VBB if extended with expressions, sequential gates, and conditional thresholds; otherwise bespoke code is justified.
  • Add per-gate telemetry and a funnel report to quickly see which parts over-filter; a simple dashboard or markdown report suffices.
  • With traces in place, introduce a quantile-based controller to adapt thresholds toward a target candidate count, with quality guardrails and smoothing.