Backtesting Pitfalls: Why Your AI Bot Will Fail in Live Markets
I spent 3 months building an LLM-based swing trading bot that crushed it in backtests: +47% annual return, 68% win rate.
Then I deployed it live. Down 12% in the first month.
Here's what went wrong—and what I learned.
Mistake #1: Ignoring Slippage
The Trap: Backtests assume you get filled at the close price. In reality, you get filled at market price + slippage.
The Fix: Add 0.1-0.3% slippage to every backtest trade. If your edge disappears, your strategy is too thin.
Mistake #2: Lookahead Bias
The Trap: I was feeding my LLM "current day" data that included the close price—data I wouldn't have in real-time.
The Fix: Only use data available before the signal triggers. Shift everything by 1 bar.
Mistake #3: Overfitting GPT-4 Prompts
The Trap: I iterated my GPT-4 prompts on the same backtest data. The model memorized patterns specific to that dataset.
The Fix: Use walk-forward validation. Train prompts on 2022-2023, validate on 2024 data you've never touched.
Mistake #4: No Position Sizing
The Trap: My backtest used fixed position sizes. In live trading, I didn't account for portfolio heat.
The Fix: Kelly Criterion or fractional sizing based on confidence scores. Never risk >2% per trade.
What Actually Works
After fixing these issues, my live results are now tracking backtest within 3-5%. Key principles:
- Assume the worst. Add slippage, commissions, and latency.
- Validate out-of-sample. Never optimize on the same data you test on.
- Start small. Run live with tiny size for 1-2 months before scaling.
AI trading is real, but the market humbles everyone. Trade small, learn fast, iterate.