ENGINEERING
Triage failed Replicate predictions into bench cases via webhook
Listens for Replicate prediction-completed webhooks, flags low-confidence or errored outputs, files each one as a GitHub regression-bench issue with reproduction inputs.
How it runs
The automated pipeline, trigger to output.
- TriggerReplicate prediction-completed webhookReplicate
- LogicFlag errored or low-confidence outputs
- ActionFile regression-bench issue with repro inputsGitHub
- ActionRecord failure for trend analysisPostgres
- OutputAlert model owner in SlackSlack
What it does
Turns live production failures into a growing regression suite. It receives Replicate's prediction-completed webhook in real time, identifies outputs that errored or fell below a confidence floor, and captures each as a reproducible bench case in GitHub so the next version is tested against real-world failure modes.
When to use it
Use it when you want your eval set to learn from production rather than stay static. Ideal for teams who keep finding the same regressions in the wild because their bench never includes the inputs that actually break the model.
How it works
- 1A Replicate prediction-completed webhook triggers the flow with the prediction payload.
- 2A logic step checks for an error status or a confidence score below the floor and drops everything else.
- 3For a flagged prediction, the flow files a GitHub issue tagged `regression-bench` with the input, output, and version that produced it.
- 4It records the failure in a Postgres table for trend analysis.
- 5It alerts the model owner in Slack with a link to the new bench case.
Set it up
What you configure once, before turning it on.
- 1Connect ReplicateImage, video, and model inference.
- 2Connect GitHubRepos, issues, pull requests, actions.
- 3Connect PostgresAny Postgres URL — query, write, migrate.
- 4Connect SlackChannels, DMs, threads, mentions.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Engineering workflows
Agent reviews model-license fit and suggests compliant swaps on the PR
When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.
Block PRs that add incompatible Hugging Face model licenses
When a pull request adds or bumps a Hugging Face model dependency, it fetches the model card license, checks it against your org's allowed-license policy.
Quarterly Logging Hygiene Audit Agent
An agent-driven quarterly sweep that surveys all Axiom datasets, builds a logging-hygiene scorecard per service.
Post-Merge Log Volume Recheck After Downsampling PR
After a log-level PR merges, waits a day then re-queries Axiom to confirm the targeted stream's volume actually dropped.
Axiom Ingest Cost Spike to Linear Triage Ticket
When Axiom ingest volume spikes beyond its baseline, identifies which service caused it and files a Linear ticket with the offending log stream, sample lines, and a downsampling…
File a Linear license-review ticket for risky model adds
When a PR introduces a Hugging Face model with a non-permissive or unknown license, it opens a Linear issue assigned to the legal-review team with the model, license.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
