DEVOPS

Triage Flaky PR Failures with an Agent and Comment the Verdict

When a PR check fails, an agent inspects the failing test logs against history to decide if the failure is a real regression or flake, comments its reasoning on the PR.

CategoryDevOps
EngineSim + Paperclip
Difficultyadvanced
Triggerevent
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitHub PR check_run failedGitHubGitHub
  • ActionPull failing logs and main-branch historyGitHubGitHub
  • LogicAgent verdict: regression vs flakeOpenAI
  • ActionComment verdict on the PRGitHubGitHub
  • ActionOpen Linear ticket for confirmed flakeLinearLinear
  • OutputReturn verdict and ticket link

What it does

This workflow stops developers from guessing whether a red check is their fault or a known flaky test. On every failing PR check, an agent reads the failure logs, compares them to the test's recent history, and posts a clear verdict — real regression vs. flake — directly on the pull request.

When to use it

Use this on busy repos where contributors waste time re-running checks and arguing over whether a failure is real. It gives an immediate, reasoned triage comment and keeps a tracked record of confirmed flakes without paging anyone.

How it works

  1. 1A GitHub check_run failure event triggers the flow on an open PR.
  2. 2The agent pulls the failing test's logs and its pass/fail history on the main branch.
  3. 3It reasons about whether the failure correlates with the PR's diff (regression) or matches a known intermittent pattern (flake).
  4. 4A logic branch routes on the verdict.
  5. 5For a regression it posts a blocking comment asking the author to investigate; for a flake it posts a reassuring comment and opens a Linear ticket tagged flaky.
  6. 6The verdict and any ticket link are returned as the output.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Connect LinearIssues, projects, cycles, triage.
  3. 3
    Connect OpenAIModels, embeddings, files.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.