DEVOPS

Auto-quarantine flaky tests on intermittent CI failures

When a GitHub Actions test run fails, checks whether the failing test passed on a retry; if so, marks it as flaky, applies a quarantine annotation.

CategoryDevOps
Enginesim
Difficultyintermediate
Triggerevent
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitHub check_run completed for a test jobGitHubGitHub
  • LogicFailed once but passed on retry?
  • ActionParse test report artifact for failing test nameGitHubGitHub
  • ActionAppend test to quarantine manifest in repoGitHubGitHub
  • OutputOpen GitHub issue labeled flaky-test with evidenceGitHubGitHub

What it does

Watches GitHub Actions test runs and distinguishes genuine failures from intermittent flakes. When a test fails but passes on an automatic retry within the same workflow, it tags the test as flaky, adds it to a quarantine list, and files a GitHub issue so the team can track and fix it later without blocking the pipeline.

When to use it

Use this when intermittent test failures are forcing engineers to re-run CI by hand and eroding trust in the build. It keeps the main branch deployable while preserving a paper trail of every quarantined test.

How it works

  1. 1A GitHub `check_run` completed event fires when a workflow test job finishes.
  2. 2A filter checks the conclusion: only proceed if the job failed at least once but a retry of the same test passed.
  3. 3The flow reads the JUnit/test report artifact to extract the exact failing test name and file.
  4. 4It appends the test identifier to a quarantine manifest committed to the repo.
  5. 5It opens a GitHub issue labeled `flaky-test` with the failure logs, retry evidence, and owning team.
  6. 6The final step posts the quarantine summary so reviewers know the build was unblocked, not silently passed.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  3. 3
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  4. 4
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.