ENGINEERING

Flaky-Test Quarantine Agent: CI Failure to Tracked Ticket + Skip MR

Watches GitHub Actions failures, uses an LLM to decide whether a failing test is genuinely flaky or a real regression.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerevent
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitHub Actions run fails (workflow_run conclusion=failure)GitHubGitHub
  • ActionFetch failing job logs and test reportGitHubGitHub
  • ActionClassify each failure: flaky vs. real regressionOpenAI
  • LogicKeep only flaky failures; drop regressions
  • ActionOpen tracked flake ticket in LinearLinearLinear
  • OutputOpen draft skip/quarantine MR on GitHubGitHubGitHub

What it does

When a CI run fails, this agent fetches the failing test logs, classifies each failure as flaky (intermittent, environment-sensitive) or a real regression, and only quarantines the flaky ones. For each confirmed flake it files a tracked Linear ticket and opens a draft GitHub MR that skips the test, so green builds resume without burying real bugs.

When to use it

Use it when intermittent failures are eroding trust in your CI signal and engineers are blindly re-running jobs. It separates genuine flakiness from regressions automatically, so you stop hand-triaging every red build.

How it works

  1. 1A GitHub Actions `workflow_run` completion with `conclusion=failure` fires the trigger.
  2. 2The agent pulls the failing job logs and the test report from the run via the GitHub API.
  3. 3An OpenAI classification step labels each failing test flaky vs. regression, citing the log evidence.
  4. 4A logic branch drops anything classified as a regression and keeps only flaky tests.
  5. 5For each flake, it creates a Linear issue with the failure history and reproduction notes.
  6. 6It opens a draft GitHub MR adding a skip/quarantine annotation, linked to the ticket.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.