Anthropic finds stronger AI models secure better deals in trading test
TL;DR
Anthropic tested 69 AI agents trading for employees in an internal market. Stronger models scored better deals unnoticed by users, so vibe builders and SMBs should pick capable models for transactions.
Anthropic researchers recently conducted a study where 69 AI agents participated in an internal market to negotiate trades on behalf of employees. The results show that more capable models consistently secured better financial outcomes for their users compared to smaller or less advanced counterparts. Most participants remained unaware that their agents were negotiating suboptimal deals, which highlights a hidden performance gap in automated tasks. This finding suggests that the intelligence level of your model directly impacts your bottom line when you automate commercial interactions. You should prioritize testing the highest performing models for any task involving money, procurement, or vendor negotiations. Do not assume that all agents perform equally well when the stakes involve financial margins. Start by benchmarking your current automation workflows against top tier models to see if you are leaving money on the table.
Who this matters for
- Vibe Builders: Stop using budget models for agentic workflows that involve money or vendor contracts.
What to watch next
Most people treat AI agents like simple chatbots, but this data proves they are actually economic actors. If you are building an app that handles payments or vendor interactions, using a cheaper, smaller model is a direct tax on your profitability. You are essentially paying for a discount that costs you more in the long run. Stop optimizing for latency or cost savings on the model side when the agent is responsible for revenue generation. If your agent is negotiating, it needs to be the smartest one available, or you are just losing money while feeling productive. This is not about fancy features; it is about basic arithmetic and competitive advantage in your automated processes.
by Harsh Desai