ENGINEERING

Flaky Test Detector with Auto-Quarantine PR

Watches CI failures on GitHub, identifies tests that pass on rerun (a flaky signature).

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerevent
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitHub workflow run failsGitHubGitHub
  • ActionFetch run attempts and job logsGitHubGitHub
  • LogicDetect flaky signature (fail then pass, same SHA)
  • ActionCommit quarantine annotation to new branchGitHubGitHub
  • OutputOpen quarantine pull requestGitHubGitHub

What it does

This workflow turns intermittent CI red into a self-healing loop. When a workflow run fails, it inspects whether the same job passed on a rerun. If a test failed then passed without any code change, it marks the test as flaky and proposes quarantining it via an automated pull request, keeping the main pipeline green while the flake is tracked for a real fix.

When to use it

Use this when flaky tests routinely block merges and engineers waste time hitting "re-run jobs." It is ideal for teams with a large suite where a handful of nondeterministic tests cause most of the noise.

How it works

  1. 1A GitHub Actions workflow_run failure event fires the workflow.
  2. 2The flow fetches the run's job history and prior attempts from the GitHub API.
  3. 3A logic step compares attempts: same commit SHA, failed then passed = flaky.
  4. 4For confirmed flakes, an action edits the test file to add a quarantine annotation and commits to a new branch.
  5. 5The output step opens a pull request describing the flake evidence and reruns observed.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  3. 3
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  4. 4
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.