burningtheta
Analysis·January 19, 2026·5 min read

AI Trading Agents Fail the Live Market Test

New research reveals LLMs struggle as autonomous traders. The AI-Trader benchmark exposes weak returns and poor risk management across US stocks, crypto, and A-shares.

MB

Michael Brennan

BurningTheta

AI Trading Agents Fail the Live Market Test

Wall Street has been betting big on AI trading. The reality check just arrived.

A new research paper from Hong Kong University introduces AI-Trader, the first fully autonomous benchmark testing large language models as live traders. The results aren't pretty: most AI agents showed poor returns and weak risk management when left to trade on their own.

The finding cuts against the prevailing narrative. With over 35% of new hedge fund launches branding themselves as AI-driven and more than 70% of global funds using machine learning somewhere in their pipeline, the assumption has been that smarter models mean better trades. This research suggests otherwise.

The Benchmark

AI-Trader tests six mainstream LLMs across three markets: US stocks (NASDAQ 100), Chinese A-shares (SSE 50), and cryptocurrencies. Each agent starts with identical capital—$10,000 for US trades, 100,000 yuan for A-shares, or 50,000 USDT for crypto.

The key innovation is what the researchers call a "fully autonomous minimal information paradigm." Agents receive only basic context: their current holdings, real-time prices, and available tools. No human intervention. No pre-packaged research. No hand-holding.

This forces the AI to do what a real trader would do: search for information, verify sources, synthesize data, and make decisions under time pressure. The benchmark runs in real-time market conditions with strict controls to prevent the models from accessing future data.

What They Found

General intelligence doesn't translate to trading skill.

The models that score highest on standard AI benchmarks didn't necessarily perform best as traders. Most agents exhibited what the researchers diplomatically call "poor returns and weak risk management" when operating without human guidance.

The cross-market analysis revealed a pattern. AI trading strategies achieved excess returns more readily in highly liquid markets like US equities compared to policy-driven environments like Chinese A-shares. The regulatory complexity and intervention common in A-share markets appears to confuse models trained primarily on Western financial data.

Risk control emerged as the key differentiator. The agents that managed drawdowns effectively showed more consistent performance across markets. Those that chased returns without managing downside got crushed.

Why This Matters

The hedge fund industry crossed $5 trillion in global assets last year. According to Bloomberg Intelligence, AI-driven quant strategies contributed over 40% of trading volumes in 2024. That's a lot of capital riding on the assumption that AI trading works.

And sometimes it does. Bridgewater's Pure Alpha fund returned 34% in 2025. AI-first funds have posted average returns of 12-15% year-to-date compared to 8-10% for traditional peers. But these results come from hybrid systems with significant human oversight, not fully autonomous agents.

The gap between supervised AI trading and autonomous AI trading appears to be substantial. When humans stay in the loop—setting parameters, overriding bad decisions, managing risk—AI can add value. When left alone, the models struggle.

This aligns with what we've seen in other AI applications. The technology excels at processing information and identifying patterns. It struggles with the kind of adaptive decision-making that markets demand.

The Liquid Market Advantage

One finding stands out: AI trading strategies work better in liquid markets.

US equities, with their deep order books and transparent price discovery, give AI agents an environment they can model effectively. The rules are clear. Information flows freely. Execution is predictable.

Policy-driven markets are different. When a central bank can move prices with a press release, or when trading halts appear without warning, AI models lose their edge. They're optimized for patterns in data, not for anticipating human political decisions.

This suggests AI-powered trading may find its sweet spot in specific market segments rather than as a universal solution. High-frequency strategies in liquid instruments look more promising than macro calls in emerging markets.

The Risk Management Problem

Poor risk management showed up repeatedly in the benchmark results.

Most AI agents didn't know when to stop. They'd chase momentum into obviously extended moves, fail to cut losing positions, and take concentrated bets that blew up their portfolios. The researchers note that risk control capability determined cross-market robustness.

This echoes real-world experience. Early 2025 saw several AI-driven funds stumble when models over-relied on historical patterns during unexpected events. Supply chain disruptions from trade conflicts caught models off guard—they'd never seen that data before.

The models that performed best had some mechanism for limiting downside exposure built into their decision frameworks. Whether through explicit position limits or learned caution from training data, risk awareness separated winners from losers.

What Comes Next

The AI-Trader benchmark is open-sourced on GitHub with a live leaderboard at ai4trade.ai. The researchers clearly intend this as a foundation for improvement rather than a final verdict.

The path forward likely involves hybrid approaches. AI handles information processing—scanning thousands of data sources, identifying anomalies, generating trade ideas. Humans handle oversight—setting risk parameters, vetoing bad trades, adapting to novel situations.

This mirrors what's already happening at successful quant funds. The AI doesn't replace the portfolio manager; it augments them. Companies building AI infrastructure understand this distinction. The goal is human-AI collaboration, not replacement.

For retail traders considering AI trading bots, the research offers a clear warning. Fully autonomous systems promising to trade your account without oversight should be approached with skepticism. The technology isn't there yet.

The Bottom Line

AI trading agents perform worse than expected when operating autonomously in live markets. General model intelligence doesn't predict trading success. Risk management matters more than return-chasing. Liquid markets are easier than policy-driven ones.

None of this means AI won't eventually transform trading. But the transformation looks more like better tools for human traders than robot replacements. The AI data and analytics plays may offer more reliable exposure to this trend than betting on fully autonomous trading systems.

The benchmark's creators put it plainly: these findings "expose critical limitations in current autonomous agents." That's useful information for anyone allocating capital to AI trading strategies. Know what you're buying.


The AI-Trader research paper and benchmark data are available at arxiv.org/abs/2512.10971 and github.com/HKUDS/AI-Trader.