DEVOPS

Agent-driven root-cause triage of newly quarantined tests

When a test is freshly quarantined, an agent pulls recent failure logs and the test source, reasons about the likely cause (timing, ordering, shared state, network), and writes…

CategoryDevOps
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerWebhook: test added to quarantine manifestHTTP webhook
  • ActionFetch test source and failure logsGitHubGitHub
  • ActionQuery Datadog for failure timing and concurrencyDatadogDatadog
  • LogicAgent classifies flake cause and drafts fix plan
  • ActionWrite triage report into Linear issueLinearLinear
  • OutputAssign issue to owning team via CODEOWNERSGitHubGitHub

What it does

Turns a raw quarantine event into an actionable diagnosis. An agent gathers the failing test's source, its recent failure logs, and the surrounding test fixtures, then reasons about the most probable flake category and proposes a concrete remediation direction. The findings are written back into the tracking issue so the assignee starts with a hypothesis, not a blank page.

When to use it

Use this when quarantined tests pile up faster than engineers can investigate them. It front-loads the tedious log-reading and pattern-matching so triage time drops from hours to minutes.

How it works

  1. 1A webhook fires when a test is added to the quarantine manifest.
  2. 2The agent fetches the test source and recent failure logs from GitHub.
  3. 3It queries Datadog for the test's historical failure timing and concurrency context.
  4. 4The agent classifies the likely cause (race condition, test ordering, shared fixture, external dependency) and drafts a remediation plan.
  5. 5It writes the structured triage report and suggested fix class into the Linear issue.
  6. 6The final step assigns the issue to the owning team based on CODEOWNERS.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Connect DatadogMetrics, traces, log search.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Connect HTTP webhookTrigger any URL on agent actions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.