Agent Hive mark

Frequently asked questions

Is CHORUS production-ready today?

No. It is a research result. The relevant question for an operator is not whether to deploy CHORUS, it is whether to start budgeting for the operating model it points at: shared policies, local decisions, eval-driven release gates. That budgeting work takes 6 to 12 months, which is why it is worth starting now.

How is this different from a multi-agent framework I already know?

Most multi-agent frameworks are central orchestrators in disguise: one planner agent assigns work to worker agents. CHORUS removes the planner. Every worker runs the same model and decides locally. The trade-off is less control and harder debugging in exchange for lower integration cost and better resilience when one worker fails.

What headcount does this avoid, and what headcount does it require?

What CHORUS actually is, in operator terms

A VLA policy is a Vision-Language-Action policy: a model that takes camera input and a natural-language instruction, then outputs motor actions. CHORUS trains one such policy and runs a copy of it on each robot in the team. Each robot sees its own view, reads the shared task description, and acts. There is no central planner assigning subtasks. Coordination emerges because every robot is reading from the same playbook and can see what the others are doing.

The "multi-embodiment" part means the same policy controls robots with different bodies: a wheeled base with one arm, a humanoid with two arms, a mobile manipulator with a gripper. Historically, each robot type needed its own controller and its own integration work. CHORUS argues you can collapse that into one model.

For an operator, the headline is not the model architecture. It is this: the per-robot integration cost and the central orchestration layer are both candidates for removal. Whether they should be removed depends on your tolerance for emergent behavior versus deterministic schedules.

The three operating models, side by side

There are three ways to coordinate a fleet of robots or software agents. Most operators have only seen the first one. The third is what CHORUS demonstrates.

Model	Who decides	Failure mode	Integration cost	Best for
Central orchestrator	One planner assigns work	Planner is a single point of failure	High per robot type	Predictable, repetitive workflows
Hand-coded coordination	Each robot follows fixed rules	Brittle when conditions change	Very high, grows with fleet	Small fixed fleets
Shared policy (CHORUS)	Each robot decides locally from a shared model	Drift, deadlocks, harder to audit	Low per new robot, high model training cost	Mixed fleets, varied tasks

The economic argument for the shared policy model is the marginal cost of adding a new robot or a new task. With a central orchestrator, every new robot type means a new adapter, new constraints, new scheduling rules. With a shared policy, you add cameras and motors and let the model handle it, assuming the model has seen something similar in training.

The cost you pay is observability. A central orchestrator has a log of every decision it made. A shared policy has thousands of local decisions per second across the fleet. You need different tooling to know what happened and why.

Why this matters outside robotics

The same architecture is showing up in software agent deployments. Replace "robot" with "agent" and "camera view" with "tool output," and the picture is identical:

A central orchestrator agent that assigns subtasks to worker agents. Familiar, debuggable, slow, and a bottleneck.
Hand-coded routing between agents using if-then logic. Fast, brittle, expensive to maintain.
A shared policy where every agent runs the same model, reads the shared context, and picks its next action locally. Cheap to scale, hard to audit.

If you are evaluating agent platforms right now, this is the axis to ask about. Most vendors are selling option one with extra steps. The interesting research, including CHORUS, is on option three.

graph TD
 A[Shared task: clear the loading dock] --> B[Robot 1: forklift]
 A --> C[Robot 2: humanoid]
 A --> D[Robot 3: mobile arm]
 B -.observes.-> E[Shared scene]
 C -.observes.-> E
 D -.observes.-> E
 E -.informs next action.-> B
 E -.informs next action.-> C
 E -.informs next action.-> D

The diagram above shows the loop: a shared instruction goes to every worker, each worker observes the same scene, and each picks its own next move. There is no arrow pointing to a central box because there isn't one.

What it costs to run, honestly

A shared VLA policy is not free. The training cost is high and is borne by whoever ships the model. The inference cost is borne by you, and it shows up in three places:

Compute per robot. Each robot needs enough onboard compute to run the policy at a useful frequency, typically 5 to 30 actions per second. That is a real bill of materials line item.
Network for shared context. Robots need to share enough state to avoid colliding with each other or duplicating work. CHORUS minimizes this, but it is not zero.
Eval and monitoring infrastructure. You cannot ship a policy you cannot measure. This is the line operators most often underestimate.

Here is a rough operator budget worksheet for a 10-robot mixed fleet, used the way a COO would size it before approving a pilot.

fleet:
 robots: 10
 embodiments: 3 # forklift, humanoid, mobile arm
 shift_hours: 16
 
per_robot_costs:
 onboard_compute_amortized_usd_per_hour: 0.80
 policy_inference_usd_per_hour: 0.00 # runs on onboard compute
 telemetry_egress_usd_per_hour: 0.05
 
shared_costs:
 eval_harness_usd_per_month: 4000 # simulated task replays
 human_review_hours_per_week: 20 # incidents and edge cases
 human_review_usd_per_hour: 75
 
estimated_monthly_total_usd: 49600
replaces:
 custom_integration_engineer_fte: 1.5
 orchestrator_maintenance_fte: 0.5

The point of the worksheet is not the exact number. It is that the cost shifts from integration engineering, which is lumpy and hard to staff, to compute and human review, which are smooth and easier to forecast. That is a real change in how you plan a robotics or agent program.

Governance: how do you know it is working

This is the section operators ask about last and should ask about first. A decentralized policy can fail in ways that a central orchestrator cannot. Two robots can both decide they are the one to pick up the box. Three robots can all wait for someone else to open the door. The model can do something subtly wrong that no single robot's logs reveal.

Eval-driven operations is the practice of running your fleet against a fixed set of recorded scenarios on a regular cadence and treating regressions as incidents. For a shared-policy fleet, the eval harness is not optional infrastructure. It is the only way you know the next model update did not break a coordination pattern that worked yesterday.

A minimal eval loop for a robot or agent fleet looks like this:

# Run the policy against recorded scenarios and flag regressions.
# This is what tells an operator whether a model update is safe to ship.
 
from chorus_eval import load_scenarios, run_policy, score
 
scenarios = load_scenarios("warehouse_v3") # 240 recorded tasks
results = []
 
for s in scenarios:
 rollout = run_policy(
 policy="chorus-2026-03",
 scenario=s,
 seed=s.seed,
 )
 results.append({
 "task": s.name,
 "success": rollout.completed,
 "time_to_complete_s": rollout.duration,
 "collisions": rollout.collision_count,
 "deadlocks": rollout.deadlock_count,
 })
 
baseline

The script above replays 240 recorded warehouse scenarios against the new policy and blocks the release if any task that passed last time now fails. That is the kind of gate you want between a model update and your floor.

This is the operator's equivalent of unit tests for a code change. It does not catch everything, but without it you are deploying blind.

Eval dashboard for a shared-policy fleet

Accountability when something goes wrong

If a robot drops a pallet, who is responsible? With a central orchestrator, you can point to the planner's decision log. With a shared policy, you need three things:

Per-robot rollout logs: what the robot saw, what action it took, what the policy's confidence was.
Shared-context snapshots: what the fleet knew collectively at the time of the incident.
Policy version pinning: which exact model weights were on each robot at the time.

If your vendor cannot produce all three on request, you do not have an operable system. You have a demo.

When to pilot this, and when to wait

Not every operator should be running a CHORUS pilot in 2026. Here is a rough decision frame:

Pilot now if you have a mixed fleet (more than one robot type), tasks that change weekly, and a tolerance for non-deterministic behavior under human supervision.
Wait if your tasks are highly repetitive, your fleet is uniform, and your existing orchestrator is working. The savings will not justify the eval investment.
Skip entirely if you cannot staff a human review function. A shared policy without human-in-the-loop review is a liability, not an asset.

The same frame applies to software agent deployments. The question is never "is this technology ready." It is "is my organization ready to operate it." The eval harness, the review staffing, and the incident process are what you are really buying.

CHORUS: One VLA Policy for Multi-Robot Collaboration