Yuzhe's Blog

yuzhes

AI Trading System: Bull vs Bear Before Every Trade

AI Trading System: Bull vs Bear Before Every Trade

TL;DR: A paper trading system where two AI agents debate every trade before it executes — Bull argues for it, Bear tears it apart, an Arbitrator decides. Built on Alpaca, driven by research from ArXiv quant finance papers.

Why I Built This

Most retail trading systems are single-threaded: one signal fires, one order goes out. That’s fine until it isn’t.

I wanted a system that could challenge its own decisions — something closer to how an investment committee works, where you have to defend your thesis before deploying capital.

The other motivation was academic: a recent ArXiv paper (2602.23330) showed that fine-grained multi-agent LLM systems with adversarial sub-tasks significantly outperform coarse single-agent approaches on trading decisions. That seemed worth testing.

Architecture

The system has three layers:

[ Signal Layer ]     Technical indicators, smart money, news

[ Debate Layer ]     Bull ↔ Bear adversarial argument

[ Execution Layer ]  Arbitrator verdict → Alpaca order + stop-loss

Signal Layer

Three sub-systems generate signals independently:

Technical (short-term, every 2h)

Smart Money (pre-market daily)

News (medium-term, post-close)

Debate Layer

Every candidate trade gets put through a three-agent process:

Bull Agent — argues for the trade. Required to give concrete technical and fundamental reasons, not vague optimism.

Bear Agent — reads Bull’s argument and attacks it. Finds the weakest assumption. Points out what could go wrong.

Arbitrator — synthesizes both sides, checks:

Here’s what an actual NO-GO looked like during testing:

⚖️  Verdict: ❌ NO-GO (95% confidence)
Reason: Bear argument decisive — current price shows $0.00,
RSI N/A. Bull's RSI=29 claim is unverifiable. No trade on
broken data.
Risk flags:
  - DATA_INTEGRITY_FAILURE: price $0.00
  - UNVERIFIABLE_THESIS: RSI mismatch with source data
  - MOMENTUM_TRAP_RISK: TSLA is a momentum stock, not mean-reversion

The system caught a data pipeline failure and refused to trade. That’s exactly the behavior you want.

Position Sizing

Based on the paper 2603.01298 on adaptive volatility control, position sizes are ATR-driven:

position_pct = (risk_per_trade_pct) / (atr_pct * atr_multiplier)

In practice:

High volatility = smaller position. Simple, but it works.

Backtest Results

Before deploying, I ran 5-year backtests (2020–2025) on SPY to validate strategy selection:

StrategyAnnual ReturnSharpevs Buy&Hold
ATR Trend Following113%0.68+14pp
RSI Mean Reversion67%0.41-32pp
MACD Momentum71%0.44-28pp
Buy & Hold SPY99%0.61baseline

Only ATR trend-following beat passive SPY over 5 years. RSI and MACD — the two most popular retail indicators — both underperformed doing nothing.

The recommended allocation based on this:

Stack

Lessons Learned

Data integrity first. The system refused its first simulated trade because price data returned $0. That’s a feature, not a bug. Never let a bad data pipeline move real money.

RSI is overrated for momentum stocks. TSLA at RSI 29 doesn’t mean it’s about to bounce — it might just be starting a real downtrend. The backtest confirmed this: RSI mean reversion consistently underperformed.

The debate adds latency but catches things. Running two LLM calls before every trade adds ~10 seconds. In exchange, you get a written record of why each decision was made. For a paper trading experiment, that’s valuable.

13F and congressional filings are the cleanest signals. Form 4 is noisy (too many option exercises and RSU grants). Congressional trades are weird but real — members of Congress have historically outperformed the market significantly. Make of that what you will.

What’s Next

Source code is private for now — might open-source the non-trading-logic pieces later.