Development & Engineering / workflow case

LLMs can backtest trading ideas surprisingly well - if you feed them clean data. It’s insanely convenient: the model wri

Beginner to intermediate Set up once, then iterate continuously @SystematicPeter
Result

Codify the engineering execution process into a reusable agent workflow.

For

Developers / AI engineers who want to incorporate coding, testing, deployment, or agent collaboration into a stable engineering workflow

LLMs can backtest trading ideas surprisingly well — if you feed them clean data. It’s insanely convenient: the model writes a “just enough” backtester for the exact hypothesis you’re testing.

But here’s the trap: LLMs sometimes make tiny implementation mistakes. And one small bug can turn a solid idea into total garbage — or total garbage into a “holy grail.”

My current workflow that keeps me honest:

Step 1: Let the LLM prototype + backtest in its own script (I use Claude Code).

Step 2: If results look real, my workflow forces it to re-implement the same logic in a proper framework (I use NautilusTrader).

Step 3: Compare outputs — equity curve + trade list must match (or be very close).

Step 4: If it matches in Nautilus, odds are the backtest is actually correct.

Best part: LLMs can port even complex strategy logic into NautilusTrader without hesitation.

Screenshot context:
Left = LLM “quick backtester” report
Right = same strategy re-coded by the LLM inside NautilusTrader

Do you have a validation step like this — or do you trust the first backtest that looks good?

Author replies in comments:

@xtyche1 Yes, I think it’s essential, because you have to be sure you’re backtesting on quality data.

@TradeWithABear It’s usually much faster.

@DenisVodchyts I don’t let it write the whole strategy. I use it to backtest hypotheses that I provide and at least approve.

@sceeto I currently use Nautilus for backtesting only.

@wgzhu1 What do you suggest — backtest by hand on paper?

@BK18699178 Definitely, I don’t use Yahoo data. I use only high-quality data that I buy — data is the essential part. If it’s garbage in, it’s garbage out.

@RicePaddyTrader Nothing special. I like that it works very well with an LLM.

Related