Artificial Intelligence
Anthropic’s Project Deal Lets Claude Agents Trade Real Goods

Anthropic on April 24 published the results of “Project Deal,” a one-week internal experiment in which Claude agents bought and sold real items on behalf of 69 employees in the company’s San Francisco office.
The agents struck 186 deals worth just over $4,000— and the study found that participants represented by stronger models walked away with measurably better outcomes that their human counterparts never noticed.
The findings, written up by Anthropic researchers Kevin K. Troy, Dylan Shields, Keir Bradwell, and Peter McCrory, give the clearest picture yet of how an AI-mediated marketplace might actually behave once agents are negotiating on both sides of a transaction.
They also surface an “uncomfortable implication” the company says industry, regulators, and users will need to confront before agentic commerce goes mainstream.
How Project Deal Worked
The experiment ran for one week in December 2025.
Anthropic recruited 69 employees, gave them each a $100 “budget” (paid out after the experiment in the form of a gift card, plus or minus the value of whatever they bought or sold), and had Claude conduct a short interview with each volunteer to figure out what they wanted to sell, at what price, what they wanted to buy, and what kind of negotiating style their agent should use. Anthropic then turned those answers into a custom system prompt for each agent.
Anthropic then ran four parallel marketplaces inside Slack channels.
“In Run A and Run D, everyone’s agent was based on Claude Opus 4.5, our then-frontier model,” the team said. “In the other two runs (Runs B and C), participants had a fifty-fifty chance of being assigned Claude Haiku 4.5, a less powerful model, instead.”
Only Run A was the “real” run where goods actually changed hands afterwards; the other three were study conditions, and participants were not told which run was real until after a post-experiment survey.
There was no human in the loop once the agents were deployed.
The project’s Slack channel randomly looped through agents, allowing them to post an item for sale, make an offer for someone else’s goods, or seal a deal. Crucially, there was no human intervention once the experiment began.
Across more than 500 listed items, agents identified matches, proposed prices, and closed deals autonomously. Humans only re-entered the picture at the end to physically swap the goods their agents had agreed to trade.
Stronger Models Quietly Negotiate Better Deals
The headline finding is straightforward: agent quality matters, and it matters in dollars.
Across 161 items sold in at least two of the four runs, an Opus seller pulled in $2.68 more on average, while an Opus buyer paid $2.45 less. When an Opus seller faced off against a Haiku buyer, the average price hit $24.18, compared to $18.63 for Opus-on-Opus deals. With a median price of $12 and an average of $20.05 across all runs, Anthropic says these gaps aren’t trivial.
Individual cases were sharper.
The same broken folding bike, same buyer, same seller: the Opus agent got $65, the Haiku agent only $38.
A lab-grown ruby Opus sold for $65 fetched only $35 when Haiku handled the listing.
The catch is what participants did not perceive.
Despite the clear price gap, participants with Haiku agents rated the fairness of their deals almost the same as Opus users: 4.06 versus 4.05 on the fairness scale.
“Twenty-eight of our participants had Haiku in one Haiku-and-Opus run and Opus in the other. And although 17 of these ranked their Opus run above their Haiku run, 11 did the opposite,” the company wrote.
A second, more counterintuitive result: the negotiation styles participants asked for in their intake interviews barely affected outcomes.
Aggressive sellers did get higher prices, but only because they set higher opening prices to begin with, Anthropic says.
Aggressive instructions produced no statistically significant lift in sale likelihood, sale price, or purchase price once the higher asking prices those users set were controlled for. Model choice mattered far more than prompting.
What It Means for Agentic Commerce
Project Deal is a pilot, not a product, and Anthropic is careful to flag the limits — a self-selected employee pool, low stakes, and no adversarial actors. Even so, 46 percent of participants said they’d pay for a service like this, which Anthropic frames as evidence that agent-mediated peer-to-peer commerce isn’t far away.
That timing matters because Anthropic has been visibly steering Claude toward consumer transactions. The company recently published a blog post committing to keep Claude conversations ad-free while explicitly endorsing agentic commerce, and it has been building out enterprise infrastructure such as Managed Agents to let Claude act on users’ behalf across third-party services. Project Deal lands as a research artifact that quietly maps the failure modes of that future.
Anthropic flags three concerns that grow from the experiment. First, in a world with companies instead of volunteers, the incentives would look very different. Optimizing for AI agent attention could become a powerful tool that doesn’t necessarily work in people’s favor.
Second, optimizing systems for AI agent attention — rather than human attention — could introduce new manipulation surfaces, including jailbreaking and prompt injection.
Third, “the policy and legal frameworks around AI models that transact on our behalf simply don’t exist yet,” the company writes.
The unanswered question is whether disclosure can close the perception gap. Project Deal participants did not know which model represented them, which is roughly the situation users will face in any consumer rollout. If a fairness gap between Opus and Haiku is invisible inside a self-selected Anthropic workforce running a one-week experiment with $100 stakes, it will likely be invisible at scale — unless marketplaces are required to disclose what agent is acting for whom and at what capability tier. That is the kind of regulatory question Anthropic is now publicly inviting, and it is the one most likely to land first when agent-mediated commerce moves beyond a Slack channel in San Francisco.












