ENGINEERING

Nightly flaky-test sweep from Datadog pass-rate metrics

Each night this queries Datadog CI Visibility for tests whose pass rate dipped into the flaky band over the last 7 days, tags them, and files a Linear issue assigned…

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNightly schedule
  • ActionQuery Datadog CI test pass rates (7d)DatadogDatadog
  • LogicKeep tests in the flaky pass-rate band
  • ActionTag flaky tests as quarantine in DatadogDatadogDatadog
  • OutputFile owner-assigned Linear quarantine issueLinearLinear

What it does

It runs a scheduled sweep against Datadog CI Visibility, pulling each test's 7-day pass rate. Tests sitting in the flaky band (passing sometimes, failing sometimes — not consistently broken) are auto-tagged and routed to Linear as owned quarantine tickets, with the consistently-failing and consistently-passing tests left alone.

When to use it

Use it when you already ship CI test data to Datadog and want a daily, metric-driven view of instability rather than reacting to individual run failures.

How it works

  1. 1A nightly schedule triggers the sweep.
  2. 2Datadog CI Visibility is queried for per-test pass rate, run count, and owning service over the trailing 7 days.
  3. 3A logic step keeps tests in the flaky band (e.g. 5%-95% pass rate with enough runs) and drops always-pass and always-fail outliers.
  4. 4Each surviving test is tagged `quarantine` via the Datadog API for dashboard tracking.
  5. 5A Linear issue is created per flaky test, assigned to the owning team, with pass-rate trend and recent run links in the body.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect LinearIssues, projects, cycles, triage.
  3. 3
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  4. 4
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  5. 5
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.