LLMs can backtest trading ideas surprisingly well — if you feed them clean data. It’s insanely convenient: the model writes a “just enough” backtester for the exact hypothesis you’re testing.
But here’s the trap: LLMs sometimes make tiny implementation mistakes. And one small bug can turn a solid idea into total garbage — or total garbage into a “holy grail.”
My current workflow that keeps me honest:
Step 1: Let the LLM prototype + backtest in its own script (I use Claude Code).
Step 2: If results look real, my workflow forces it to re-implement the same logic in a proper framework (I use NautilusTrader).
Step 3: Compare outputs — equity curve + trade list must match (or be very close).
Step 4: If it matches in Nautilus, odds are the backtest is actually correct.
Best part: LLMs can port even complex strategy logic into NautilusTrader without hesitation.
Screenshot context:
Left = LLM “quick backtester” report
Right = same strategy re-coded by the LLM inside NautilusTrader
Do you have a validation step like this — or do you trust the first backtest that looks good?
Author replies in comments:
@xtyche1 Yes, I think it’s essential, because you have to be sure you’re backtesting on quality data.
@TradeWithABear It’s usually much faster.
@DenisVodchyts I don’t let it write the whole strategy. I use it to backtest hypotheses that I provide and at least approve.
@sceeto I currently use Nautilus for backtesting only.
@wgzhu1 What do you suggest — backtest by hand on paper?
@BK18699178 Definitely, I don’t use Yahoo data. I use only high-quality data that I buy — data is the essential part. If it’s garbage in, it’s garbage out.
@RicePaddyTrader Nothing special. I like that it works very well with an LLM.