DEVOPS
Triage Flaky PR Failures with an Agent and Comment the Verdict
When a PR check fails, an agent inspects the failing test logs against history to decide if the failure is a real regression or flake, comments its reasoning on the PR.
How it runs
The automated pipeline, trigger to output.
- TriggerGitHub PR check_run failedGitHub
- ActionPull failing logs and main-branch historyGitHub
- LogicAgent verdict: regression vs flakeOpenAI
- ActionComment verdict on the PRGitHub
- ActionOpen Linear ticket for confirmed flakeLinear
- OutputReturn verdict and ticket link
What it does
This workflow stops developers from guessing whether a red check is their fault or a known flaky test. On every failing PR check, an agent reads the failure logs, compares them to the test's recent history, and posts a clear verdict — real regression vs. flake — directly on the pull request.
When to use it
Use this on busy repos where contributors waste time re-running checks and arguing over whether a failure is real. It gives an immediate, reasoned triage comment and keeps a tracked record of confirmed flakes without paging anyone.
How it works
- 1A GitHub check_run failure event triggers the flow on an open PR.
- 2The agent pulls the failing test's logs and its pass/fail history on the main branch.
- 3It reasons about whether the failure correlates with the PR's diff (regression) or matches a known intermittent pattern (flake).
- 4A logic branch routes on the verdict.
- 5For a regression it posts a blocking comment asking the author to investigate; for a flake it posts a reassuring comment and opens a Linear ticket tagged flaky.
- 6The verdict and any ticket link are returned as the output.
Set it up
What you configure once, before turning it on.
- 1Connect GitHubRepos, issues, pull requests, actions.
- 2Connect LinearIssues, projects, cycles, triage.
- 3Connect OpenAIModels, embeddings, files.
- 4Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 5Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 6Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More DevOps workflows
Slack-approved pause for idle Hugging Face Spaces
On a daily scan it finds idle paid Spaces and posts an interactive Slack approval; on approve it pauses the Space and logs the decision to a GitHub issue audit trail.
Block costly Hugging Face Space hardware upgrades in PR review
When a pull request changes a Space's hardware config, it estimates the new monthly cost and posts a GitHub PR comment that flags upgrades crossing a budget ceiling.
Hugging Face Spaces idle-runtime sweep with auto-pause
On a schedule, scans all Hugging Face Spaces for ones running idle past a threshold, pauses them to stop billing, and posts a Slack summary with the estimated monthly savings.
Open a Zoom war-room from a Datadog multi-alert storm
When a Datadog monitor crosses a critical threshold, this workflow dedupes against active incidents, and only for a genuinely new outage it creates a Zoom bridge.
Auto-spin a Zoom war-room when PagerDuty hits SEV-1
When a PagerDuty incident escalates to a critical severity, this workflow creates a dedicated Zoom meeting and posts the bridge link to the incident's Slack channel so responders…
Spin up a war-room on demand from a Slack slash command
When an engineer runs a Slack command, this workflow creates a Zoom bridge, opens a tracking Sentry-linked incident, files a Linear issue for follow-up.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
