ENGINEERING

Classify and Quarantine Intermittent CI Failures with AI

When a CI job fails, an agent reads the failure logs to decide whether it is a real regression or flakiness.

CategoryEngineering
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitHub failed workflow runGitHubGitHub
  • ActionFetch test logs and recent historyGitHubGitHub
  • LogicAgent classifies: regression vs. flaky
  • ActionOpen labeled quarantine issue (if flaky)GitHubGitHub
  • ActionPage on-call (if real regression)PagerDutyPagerDuty
  • OutputPost classification reasoning to SlackSlack

What it does

On every failed CI run, an agent inspects the failure logs and the test's recent history to classify the failure as either a real regression or intermittent flakiness. Confirmed flakes are quarantined (labeled GitHub issue + skip entry); suspected real breakages are escalated to on-call so they aren't silently hidden.

When to use it

Use it when naive auto-quarantine is too risky — you don't want to hide a genuine regression behind a flaky label. The agent adds judgment by reading stack traces, timeout patterns, and prior pass/fail history before deciding.

How it works

  1. 1A GitHub webhook fires on a failed workflow run.
  2. 2The agent fetches the failing test's logs and its recent pass/fail history via the GitHub API.
  3. 3It classifies the failure: real regression vs. flaky (timeouts, ordering, network jitter, race conditions).
  4. 4If flaky, it opens a labeled quarantine issue and records the rationale.
  5. 5If a likely real regression, it pages on-call via PagerDuty with the diagnosis.
  6. 6It posts the classification and reasoning to Slack for visibility.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.