ENGINEERING

Flaky-Test Quarantine Agent: CI Failure to Tracked Ticket + Skip MR

Watches GitHub Actions failures, uses an LLM to decide whether a failing test is genuinely flaky or a real regression.

CategoryEngineering

Enginesim

Difficultyintermediate

Triggerevent

Steps6

Setup~15 min

How it runs

The automated pipeline, trigger to output.

TriggerGitHub Actions run fails (workflow_run conclusion=failure)GitHub
ActionFetch failing job logs and test reportGitHub
ActionClassify each failure: flaky vs. real regressionOpenAI
LogicKeep only flaky failures; drop regressions
ActionOpen tracked flake ticket in LinearLinear
OutputOpen draft skip/quarantine MR on GitHubGitHub

What it does

When a CI run fails, this agent fetches the failing test logs, classifies each failure as flaky (intermittent, environment-sensitive) or a real regression, and only quarantines the flaky ones. For each confirmed flake it files a tracked Linear ticket and opens a draft GitHub MR that skips the test, so green builds resume without burying real bugs.

When to use it

Use it when intermittent failures are eroding trust in your CI signal and engineers are blindly re-running jobs. It separates genuine flakiness from regressions automatically, so you stop hand-triaging every red build.

How it works

1A GitHub Actions `workflow_run` completion with `conclusion=failure` fires the trigger.
2The agent pulls the failing job logs and the test report from the run via the GitHub API.
3An OpenAI classification step labels each failing test flaky vs. regression, citing the log evidence.
4A logic branch drops anything classified as a regression and keeps only flaky tests.
5For each flake, it creates a Linear issue with the failure history and reproduction notes.
6It opens a draft GitHub MR adding a skip/quarantine annotation, linked to the ticket.

Set it up

What you configure once, before turning it on.

1
Connect GitHubRepos, issues, pull requests, actions.
2
Connect OpenAIModels, embeddings, files.
3
Connect LinearIssues, projects, cycles, triage.
4
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
5
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
6
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More Engineering workflows

Upgrade Impact Router to Module Code Owners

Maps a dependency-bump PR's affected modules to their CODEOWNERS, then DMs each owner on Slack with only the changelog slice that touches code they own.

Re-Voice IVR Prompts on Phone-Tree Config Merge

When a phone-tree config change merges in GitHub, regenerates the ElevenLabs audio for any prompt whose script changed in the diff and opens a follow-up PR adding the new audio…

Agent reviews model-license fit and suggests compliant swaps on the PR

When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.

Scan for deprecated endpoints and email consumers a weekly sunset countdown

On a weekly schedule, scans the OpenAPI spec for endpoints marked deprecated with a sunset date, and emails each consuming team a countdown of how many days remain before removal.

Publish a versioned API changelog to Confluence on each release tag

On a new semver release tag, gathers the contract changes since the last release and writes a clean.

Gate breaking API PRs behind downstream consumer acknowledgement

When a PR introduces a breaking contract change, comments the impact summary back on the PR, applies a blocking label.

Browse all Engineering →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Marketing

Content Marketing Agency

SEO, blogs, social, and reporting on autopilot.

Operations

Internal Operations

Runbooks, on-call, vendor management — disciplined and audited.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →