ENGINEERING

Triage failed Replicate predictions into bench cases via webhook

Listens for Replicate prediction-completed webhooks, flags low-confidence or errored outputs, files each one as a GitHub regression-bench issue with reproduction inputs.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerwebhook
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerReplicate prediction-completed webhookReplicateReplicate
  • LogicFlag errored or low-confidence outputs
  • ActionFile regression-bench issue with repro inputsGitHubGitHub
  • ActionRecord failure for trend analysisPostgreSQLPostgres
  • OutputAlert model owner in SlackSlack

What it does

Turns live production failures into a growing regression suite. It receives Replicate's prediction-completed webhook in real time, identifies outputs that errored or fell below a confidence floor, and captures each as a reproducible bench case in GitHub so the next version is tested against real-world failure modes.

When to use it

Use it when you want your eval set to learn from production rather than stay static. Ideal for teams who keep finding the same regressions in the wild because their bench never includes the inputs that actually break the model.

How it works

  1. 1A Replicate prediction-completed webhook triggers the flow with the prediction payload.
  2. 2A logic step checks for an error status or a confidence score below the floor and drops everything else.
  3. 3For a flagged prediction, the flow files a GitHub issue tagged `regression-bench` with the input, output, and version that produced it.
  4. 4It records the failure in a Postgres table for trend analysis.
  5. 5It alerts the model owner in Slack with a link to the new bench case.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect ReplicateImage, video, and model inference.
  2. 2
    Connect GitHubRepos, issues, pull requests, actions.
  3. 3
    Connect PostgresAny Postgres URL — query, write, migrate.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.