ENGINEERING

GitLab Flaky-Test Quarantine Bot

Watches GitLab CI pipeline failures, counts how often each spec has failed across recent runs, and when a test crosses the repeat-failure threshold it auto-labels it.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerevent
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitLab pipeline failed on default branchGitLabGitLab
  • ActionFetch JUnit artifact and parse failed testsGitLabGitLab
  • ActionIncrement per-test failure counter (14-day window)PostgreSQLPostgres
  • LogicFilter tests at or over the repeat-failure threshold
  • ActionOpen GitLab issue labeled flaky::quarantineGitLabGitLab
  • OutputPost quarantine candidates to on-call Slack channelSlack

What it does

Keeps a running tally of which individual test cases fail on your default-branch pipelines. When a single spec fails three or more times within a rolling window, it flags it as flaky, files a GitLab issue tagged `flaky::quarantine`, and alerts the on-call channel so no one re-runs the pipeline blindly.

When to use it

Use it when intermittent test failures are eroding trust in CI and people have started reflexively hitting retry. It turns "is this flaky or real?" into a tracked, labeled decision instead of tribal knowledge.

How it works

  1. 1A GitLab pipeline-failed webhook fires for the default branch.
  2. 2The flow fetches the JUnit report artifact and extracts each failed test name plus its file path.
  3. 3It increments a per-test failure counter stored in Postgres, scoped to a rolling 14-day window.
  4. 4A logic step checks each test against the repeat-failure threshold (default 3).
  5. 5For tests over the line, it creates a GitLab issue labeled `flaky::quarantine` with the failure history and links to the offending pipelines.
  6. 6A Slack message summarizes the new quarantine candidates with jump links to each issue.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitLabRepos, MRs, pipelines, registry.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.