Agent Hive mark

Frequently asked questions

How many rounds does the loop actually need?

In the cases the paper studies, convergence happens within tens of rounds for small action spaces. In practice, set a hard cap (say 30), measure the entropy of the last 5 rounds, and stop when it drops below a threshold you pick during shadow mode. If you never converge, that is a signal the payoff structure is wrong or the action space is too large.

Can we use a cheaper model for this?

You can, and you should test it. The reasoning per round is bounded: read a short history, predict opponent's next move, pick your best response. Mid-tier models often do this well. The risk is that a weaker model picks a dominated strategy and the loop converges on a bad equilibrium. Always compare against a frontier-model baseline on a held-out set before committing.

What if the opponent is a human, not another agent?

Where divide-and-conquer agents stop working

The dominant pattern for multi-agent systems today is execution-heavy. One agent plans, others write code, search the web, draft documents, and a critic agent reviews. This works well when the task decomposes cleanly: research a market, build a slide, file the report.

It works poorly when the task is strategic. Strategic tasks have three properties:

Another party is also choosing.
Your best move depends on what you think they will choose.
Their best move depends on what they think you will choose.

Pricing against a competitor, negotiating a renewal, bidding in a procurement auction, allocating a shared budget across business units: all of these have that recursive structure. Splitting subtasks across cooperative agents does not help, because the hard part is not workload, it is anticipating a counterparty.

Cooperative versus strategic agent settings

A language model asked to "decide the price" in a one-shot prompt will give you a plausible number. But it has no mechanism to reason about the fact that the competitor is doing the same exercise about you. Chain-of-thought helps with arithmetic and logic. It does not, on its own, produce equilibrium reasoning.

What fictitious play is, in plain terms

Fictitious play (FP) was introduced by George Brown in 1951. The idea is small and old, which is part of why it travels well.

Each player keeps a tally of every move the opponent has made so far. On each round, the player assumes the opponent will play according to the average of that history, then picks the move that does best against that average. The opponent does the same. Repeat.

In a wide class of games, this process converges to a Nash equilibrium: a stable point where no player can do better by unilaterally changing strategy. You do not need to solve the game analytically. You just need to play it, remember, and respond.

The contribution of the paper is to put a language model in the role of each player. The model is not asked "what is the right answer." It is asked, round by round, "given this history of what your opponent did, what should you do next." The history does the heavy lifting that the model alone cannot.

Why this matters for operators

You already have access to capable models. You probably do not have access to a clean game-theoretic formulation of your pricing problem. What you need is a structured loop that turns the model's general reasoning into something that converges on a defensible decision. Fictitious play is that loop.

Round 1: Agent A plays a1. Agent B plays b1.
Round 2: A sees {b1}, plays a2. B sees {a1}, plays b2.
Round 3: A sees {b1,b2}, plays a3. B sees {a1,a2}, plays b3.
...
Round N: A's strategy stabilizes. B's strategy stabilizes.
 Output: the converged action profile.

The model is doing two things on each round: estimating what the opponent's average behavior implies about their next move, and choosing its own best response. Both are tasks language models are reasonable at when given explicit context.

A worked example: contract renewal pricing

Suppose you are deciding what price to offer for a renewal with a vendor who is also deciding what counter to send. You have four price tiers, they have four counter tiers, and you both know the rough payoff structure (margin, churn risk, switching cost).

Here is a minimal scaffold. The point is not the prompt wording; it is the loop.

# Runs a fictitious-play loop between two LLM agents
# and returns the converged action for each side.
from collections import Counter
from llm import chat # your model client
 
ACTIONS_A = ["hold", "+3%", "+7%", "+12%"]
ACTIONS_B = ["accept", "counter -2%", "counter -5%", "walk"]
 
def best_response(side, own_history, opp_history, context):
 prompt = f"""
 You are agent {side} in a renewal negotiation.
 Context: {context}
 Opponent's past moves: {opp_history}
 Your past moves: {own_history

Two things to notice. First, you are paying for roughly 2N model calls, where N is the number of rounds. Twenty rounds is forty calls per decision. At current frontier prices that is cents, not dollars, but at scale you should track it. Second, the convergence check at the end is crude: looking at the last five rounds and taking the mode. In production you would track entropy of the action distribution and stop when it falls below a threshold.

Convergence of action frequencies over rounds

How this compares to the alternatives

Most teams considering this kind of decision have three other options on the table. Here is how they line up.

Approach	Where it fits	Inference cost per decision	Handles adversarial counterparty	Operator effort to set up
Single-shot LLM prompt	Quick judgment calls, low stakes	1 call	No	Hours
Cooperative multi-agent (planner + workers)	Research, drafting, execution	5 to 20 calls	No	Days
Reinforcement learning policy	High-volume repeated games, fixed environment	Training is expensive, inference is cheap	Yes, if trained well	Months, plus data
LLM fictitious play	Strategic decisions, low to medium volume	20 to 100 calls	Yes	Days

The honest read: if you are pricing on a marketplace ten thousand times a day, train a policy. If you are renewing twenty enterprise contracts a quarter, fictitious play with a frontier model is the better operator choice. You skip the data collection, you skip the training cycle, and the marginal cost of a decision is a coffee.

When not to use it

Three cases where this approach is the wrong tool:

The other side is not strategic. If you are deciding internal budget splits with no opposing agent, you do not need equilibrium reasoning. You need forecasting.
The payoff structure is unknown. Fictitious play assumes both agents can evaluate outcomes. If you cannot describe the payoff matrix in the prompt, the model is guessing in a vacuum.
The decision must be auditable to a regulator. Convergence behavior of a model-in-the-loop is harder to explain than a fixed rule. Document carefully or stay with rules.

Fitting it into an agent operating model

If you are building toward an agent operating model, decisions like pricing, vendor selection, and budget allocation are exactly the places where you do not want a single model output and you do not want a human bottleneck on every call. A fictitious-play loop sits between those.

flowchart LR
 A[Decision request] --> B{Strategic?}
 B -- No --> C[Single agent response]
 B -- Yes --> D[FP loop: N rounds]
 D --> E[Convergence check]
 E -- Converged --> F[Proposed action + history log]
 E -- Not converged --> G[Escalate to human]
 F --> H[Human approves or overrides]
 H --> I[Execute]

The history log is the part operators tend to underrate. Every round of the loop produces a record of what each agent assumed and chose. That is your audit trail. When a deal closes badly, you can replay the loop, change the context, and see what would have moved the outcome. That is hard to do with a single-shot prompt and impossible with a black-box policy.

A short rollout plan

For a team trying this in the next quarter:

Pick one decision type. Renewal pricing or vendor negotiation are good starting points.
Write down the payoff structure with someone who actually owns the P&L. If you cannot, stop here.
Build the loop. Cap it at 30 rounds. Log every call.
Run it in shadow mode against the last 50 decisions your team made. Compare.
If the loop's choices are within tolerance on 70% of cases and beat the human on the remainder, move to assisted mode where it proposes and a human approves.
Review monthly. Track inference cost, decision latency, and outcome quality.

This is not a research project. It is a four to six week engagement for a small team.

Governance notes

Two governance points worth flagging early, because they tend to surface late.

The first is collusion risk. If you run an LLM fictitious-play loop against a counterparty that is also running one, and both models share a base provider, you may be drifting toward coordinated outcomes that look like price fixing to a regulator. The legal exposure is unsettled. Keep humans in the approval path for any decision that touches a competitor's pricing.

The second is model drift. The converged action depends on the model. If your provider ships a new version, your equilibria can shift without warning. Pin model versions for any loop that drives a financial decision, and re-run your shadow evaluation when you upgrade.

Better Decisions with LLMs via Multi-Agent Fictitious Play