Engineering · 23 posts
How we build Agent Hive — runtime, infrastructure, scaling, deployment patterns.
A Sovereign Execution Broker sits between agents and production systems, refusing every mutating action that lacks a signed, scoped, evidence-backed…
ARD is Google's proposed open standard for helping autonomous agents find what a website offers without scraping HTML or manual API setup.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
Multi-agent fictitious play lets language models converge on equilibrium strategies for pricing, negotiation, and auctions without retraining.
DRFLOW tests whether AI agents can predict the right action sequence for a specific user, not just write a good report. Here is what that means for buyers.
RubricsTree decomposes open-ended health agent answers into yes/no rubric trees, replacing costly physician grading with reproducible, compute-scale…
How combining neural networks with formal logic lets teams prove multi-agent workflows will hit goals before shipping, without the cost of pure model…
A cheap directional speaker can blur a camera sensor enough to fool AI models. Here is what operators of camera-driven systems need to know and do.
How coordinated preference learning trains agent teams to agree on tradeoffs across competing objectives, without a central arbiter at runtime.
A new label-free signal checks whether subtask answers compose into the whole answer, outperforming self-consistency and semantic entropy on multi-step…
SpatialClaw argues spatial reasoning failures in VLMs stem from poor tool interfaces, not weak models. Tighter schemas and structured outputs cut errors…
FablePool lets backers fund a written prompt and an AI agent pipeline ships the product publicly. Here is what that means for how software gets built and…
How service robots estimate heart rate from a standard RGB camera, and what it takes to keep the signal stable under real-world lighting conditions.
CHORUS runs a single vision-language-action policy across a mixed robot fleet, removing the central orchestrator. Here is what that means for operators.
How a small team outperformed frontier models on CCL25-Eval Task 5 by fine-tuning Qwen2.5 with LoRA on a custom poetry dataset.
A study of 25 million Hacker News and Reddit comments shows how AI-writing accusations spread, what triggers them, and what operators risk when deploying…
A new Riemannian solver for low-rank optimal transport cuts tuning and stabilizes convergence. Here is what changes for matching, logistics, and retrieval…
How the MSUE soccer VQA system shows operators a repeatable pipeline for fine-tuning vision-language models on proprietary data without large annotation…
A fully compliant agent is a liability. Here is how to engineer structured refusal into autonomous agents, with a working taxonomy and operator guidance.
AdaCodec reduces visual token count in video language models by encoding only frame-to-frame changes, borrowing the residual logic of H.264 and HEVC codec…
If Anthropic, OpenAI, and SpaceX list at current valuations, the effects on compute pricing, talent costs, and AI governance disclosures will matter for o…
Terminal-first → Aider. VS Code agent → Cline. IDE autocomplete + chat → Continue.dev.
The infrastructure choices behind Agent Hive: Fly machines per colony, Supabase Postgres for control-plane state, Vercel for the dashboard, and why this stack.
We could have shipped a closed-source agent runtime. We did not. Here is what the Hive runtime does, what it does not do, and how Agent Hive wraps it.