
Multi-agent fictitious play lets language models converge on equilibrium strategies for pricing, negotiation, and auctions without retraining.
In the cases the paper studies, convergence happens within tens of rounds for small action spaces. In practice, set a hard cap (say 30), measure the entropy of the last 5 rounds, and stop when it drops below a threshold you pick during shadow mode. If you never converge, that is a signal the payoff structure is wrong or the action space is too large.
You can, and you should test it. The reasoning per round is bounded: read a short history, predict opponent's next move, pick your best response. Mid-tier models often do this well. The risk is that a weaker model picks a dominated strategy and the loop converges on a bad equilibrium. Always compare against a frontier-model baseline on a held-out set before committing.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
Then you are not running fictitious play between two agents. You are running it between your agent and a model of the human, where the model is updated from the human's observed moves. This works for repeated negotiations with the same counterparty, where you have history. For one-shot deals with a new counterparty, the loop is mostly producing a prior, not an equilibrium.
No. It gives your pricing team a structured way to stress-test proposed offers against a counterparty model before the offer goes out. The deciding humans still decide. The change is that they decide with a logged simulation in front of them instead of a gut call.
They solve different problems. RLHF (reinforcement learning from human feedback) shapes a model's general behavior at training time. Fictitious play wraps a trained model in a runtime loop for a specific class of decisions. You can use both: an RLHF-trained model inside an FP loop. The loop does not need retraining, which is most of why it is attractive to operators.
A recent paper, Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play, points at a gap most operators have already felt. You can hand a language model a problem and get a fluent answer. You can split that problem across a team of agents and get a faster answer. Neither approach is good at the kind of decision where someone on the other side of the table is also thinking.
This post is about why that gap exists, what fictitious play actually does, and how to decide whether the extra inference cost is worth it for your business.
The dominant pattern for multi-agent systems today is execution-heavy. One agent plans, others write code, search the web, draft documents, and a critic agent reviews. This works well when the task decomposes cleanly: research a market, build a slide, file the report.
It works poorly when the task is strategic. Strategic tasks have three properties:
Pricing against a competitor, negotiating a renewal, bidding in a procurement auction, allocating a shared budget across business units: all of these have that recursive structure. Splitting subtasks across cooperative agents does not help, because the hard part is not workload, it is anticipating a counterparty.

A language model asked to "decide the price" in a one-shot prompt will give you a plausible number. But it has no mechanism to reason about the fact that the competitor is doing the same exercise about you. Chain-of-thought helps with arithmetic and logic. It does not, on its own, produce equilibrium reasoning.
Fictitious play (FP) was introduced by George Brown in 1951. The idea is small and old, which is part of why it travels well.
Each player keeps a tally of every move the opponent has made so far. On each round, the player assumes the opponent will play according to the average of that history, then picks the move that does best against that average. The opponent does the same. Repeat.
In a wide class of games, this process converges to a Nash equilibrium: a stable point where no player can do better by unilaterally changing strategy. You do not need to solve the game analytically. You just need to play it, remember, and respond.
The contribution of the paper is to put a language model in the role of each player. The model is not asked "what is the right answer." It is asked, round by round, "given this history of what your opponent did, what should you do next." The history does the heavy lifting that the model alone cannot.
You already have access to capable models. You probably do not have access to a clean game-theoretic formulation of your pricing problem. What you need is a structured loop that turns the model's general reasoning into something that converges on a defensible decision. Fictitious play is that loop.
Round 1: Agent A plays a1. Agent B plays b1.
Round 2: A sees {b1}, plays a2. B sees {a1}, plays b2.
Round 3: A sees {b1,b2}, plays a3. B sees {a1,a2}, plays b3.
...
Round N: A's strategy stabilizes. B's strategy stabilizes.
Output: the converged action profile.The model is doing two things on each round: estimating what the opponent's average behavior implies about their next move, and choosing its own best response. Both are tasks language models are reasonable at when given explicit context.
Suppose you are deciding what price to offer for a renewal with a vendor who is also deciding what counter to send. You have four price tiers, they have four counter tiers, and you both know the rough payoff structure (margin, churn risk, switching cost).
Here is a minimal scaffold. The point is not the prompt wording; it is the loop.
# Runs a fictitious-play loop between two LLM agents
# and returns the converged action for each side.
from collections import Counter
from llm import chat # your model client
ACTIONS_A = ["hold", "+3%", "+7%", "+12%"]
ACTIONS_B = ["accept", "counter -2%", "counter -5%", "walk"]
def best_response(side, own_history, opp_history, context):
prompt = f"""
You are agent {side} in a renewal negotiation.
Context: {context}
Opponent's past moves: {opp_history}
Your past moves: {own_history
Two things to notice. First, you are paying for roughly 2N model calls, where N is the number of rounds. Twenty rounds is forty calls per decision. At current frontier prices that is cents, not dollars, but at scale you should track it. Second, the convergence check at the end is crude: looking at the last five rounds and taking the mode. In production you would track entropy of the action distribution and stop when it falls below a threshold.

Most teams considering this kind of decision have three other options on the table. Here is how they line up.
| Approach | Where it fits | Inference cost per decision | Handles adversarial counterparty | Operator effort to set up |
|---|---|---|---|---|
| Single-shot LLM prompt | Quick judgment calls, low stakes | 1 call | No | Hours |
| Cooperative multi-agent (planner + workers) | Research, drafting, execution | 5 to 20 calls | No | Days |
| Reinforcement learning policy | High-volume repeated games, fixed environment | Training is expensive, inference is cheap | Yes, if trained well | Months, plus data |
| LLM fictitious play | Strategic decisions, low to medium volume | 20 to 100 calls | Yes | Days |
The honest read: if you are pricing on a marketplace ten thousand times a day, train a policy. If you are renewing twenty enterprise contracts a quarter, fictitious play with a frontier model is the better operator choice. You skip the data collection, you skip the training cycle, and the marginal cost of a decision is a coffee.
Three cases where this approach is the wrong tool:
If you are building toward an agent operating model, decisions like pricing, vendor selection, and budget allocation are exactly the places where you do not want a single model output and you do not want a human bottleneck on every call. A fictitious-play loop sits between those.
flowchart LR
A[Decision request] --> B{Strategic?}
B -- No --> C[Single agent response]
B -- Yes --> D[FP loop: N rounds]
D --> E[Convergence check]
E -- Converged --> F[Proposed action + history log]
E -- Not converged --> G[Escalate to human]
F --> H[Human approves or overrides]
H --> I[Execute]The history log is the part operators tend to underrate. Every round of the loop produces a record of what each agent assumed and chose. That is your audit trail. When a deal closes badly, you can replay the loop, change the context, and see what would have moved the outcome. That is hard to do with a single-shot prompt and impossible with a black-box policy.
For a team trying this in the next quarter:
This is not a research project. It is a four to six week engagement for a small team.
Two governance points worth flagging early, because they tend to surface late.
The first is collusion risk. If you run an LLM fictitious-play loop against a counterparty that is also running one, and both models share a base provider, you may be drifting toward coordinated outcomes that look like price fixing to a regulator. The legal exposure is unsettled. Keep humans in the approval path for any decision that touches a competitor's pricing.
The second is model drift. The converged action depends on the model. If your provider ships a new version, your equilibria can shift without warning. Pin model versions for any loop that drives a financial decision, and re-run your shadow evaluation when you upgrade.