DEVOPS

AI visual judge that scores previews and pages on-call for breakage

Sends preview screenshots to a vision model that judges layout, broken images, and overflow.

CategoryDevOps

Enginesim

Difficultyadvanced

Triggerwebhook

Steps6

Setup~25 min

How it runs

The automated pipeline, trigger to output.

TriggerVercel preview deploy readyVercel
ActionCapture route screenshotsBrowserbase
ActionVision model scores layout and defectsOpenAI
LogicBranch: score below pass threshold?
ActionOpen PagerDuty incident with findingsPagerDuty
OutputSet failing GitHub commit statusGitHub

What it does

Instead of relying only on pixel diffs, this workflow asks a vision model to act as a QA reviewer. It captures each critical route on the Vercel preview and prompts the model to flag broken layouts, missing or 404 images, text overflow, and obviously broken components, returning a structured score and an issue list. Builds below the quality bar are treated as production incidents.

When to use it

Use it when pixel diffing misses semantic breakage, like an image that loads as a gray box or a button that overflows its container on a fresh page that has no baseline. It suits teams who want a judgment call on net-new pages where a baseline comparison is impossible.

How it works

1A Vercel preview-ready webhook starts the run.
2A headless browser captures full-page screenshots of the configured routes.
3Each screenshot is sent to a vision model with a rubric to score quality and list defects.
4A branch checks whether the aggregate score is below the pass threshold.
5If it fails, a PagerDuty incident is opened with the offending pages and findings.
6A failing GitHub commit status is set so the deploy cannot be promoted.

Set it up

What you configure once, before turning it on.

1
Connect VercelDeploys, runtime logs, analytics.
2
Connect BrowserbaseHeadless browsers, sessions, replays.
3
Connect OpenAIModels, embeddings, files.
4
Connect PagerDutyIncidents, on-call, escalations.
5
Connect GitHubRepos, issues, pull requests, actions.
6
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
7
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
8
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More DevOps workflows

Hugging Face Spaces idle-runtime sweep with auto-pause

On a schedule, scans all Hugging Face Spaces for ones running idle past a threshold, pauses them to stop billing, and posts a Slack summary with the estimated monthly savings.

Slack-approved pause for idle Hugging Face Spaces

On a daily scan it finds idle paid Spaces and posts an interactive Slack approval; on approve it pauses the Space and logs the decision to a GitHub issue audit trail.

Generate a weekly de-flake report and assign Linear cleanup tickets

On a weekly schedule, aggregates the current quarantine manifest and recent flake history, builds a prioritized report.

Block costly Hugging Face Space hardware upgrades in PR review

When a pull request changes a Space's hardware config, it estimates the new monthly cost and posts a GitHub PR comment that flags upgrades crossing a budget ceiling.

Auto-release tests from quarantine once they prove stable

Triggered by a webhook from a nightly stability runner, checks whether quarantined tests have passed enough consecutive runs, removes the stable ones from quarantine in GitHub.

Quarantine a test on demand from a PR comment command

Triggered when an engineer comments a quarantine command on a pull request, validates the test name, commits the quarantine change to that PR branch, opens a tracking issue.

Browse all DevOps →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Software

SaaS Operator (Pre-PMF)

Talk to users, ship features, kill what doesn't land.

Software

AI Tools Startup

Ship an AI tool, distribute on every channel, watch the unit economics.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →