ENGINEERING

Triage Flaky Tests by Datadog Flakiness Rate

On a daily schedule, pulls per-test flakiness rates from Datadog CI Visibility, quarantines any test above a threshold by filing a labeled Linear ticket.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule
  • ActionQuery Datadog CI flakiness ratesDatadogDatadog
  • LogicFilter tests above flake-rate threshold
  • ActionResolve owning team from CODEOWNERSGitHubGitHub
  • ActionCreate labeled Linear ticket per testLinearLinear
  • OutputPost quarantine digest to SlackSlack

What it does

Runs a daily sweep over Datadog CI Visibility flakiness metrics, ranks tests by their flake rate, and quarantines any that cross a configured threshold (e.g. >5% over 50 runs). Each quarantined test gets a Linear ticket labeled `flaky`, assigned to the owning team via CODEOWNERS.

When to use it

Use it when you have Datadog Test Optimization wired into CI and want a metrics-driven, non-reactive quarantine policy rather than chasing individual red builds. It catches slow-burn flakiness that single-run detectors miss.

How it works

  1. 1A daily schedule trigger kicks off the sweep.
  2. 2It queries Datadog CI Visibility for per-test flake rates over the trailing window.
  3. 3A filter keeps only tests above the configured flake-rate threshold.
  4. 4For each, it resolves the owning team from the repo's CODEOWNERS via the GitHub API.
  5. 5It creates a Linear ticket with the `flaky` label, flake-rate stats, and Datadog deep links, assigned to that team.
  6. 6It posts the day's quarantine digest to the engineering Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect LinearIssues, projects, cycles, triage.
  3. 3
    Connect GitHubRepos, issues, pull requests, actions.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.