ENGINEERING

Auto-Quarantine Tests Crossing a Datadog Flakiness Threshold

On a daily schedule, queries Datadog CI Visibility for tests whose flakiness rate exceeds a threshold, quarantines each one in the repo.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule
  • ActionQuery Datadog CI Visibility for flaky tests over thresholdDatadogDatadog
  • LogicFilter out already-quarantined tests
  • ActionOpen quarantine PR for each offenderGitHubGitHub
  • ActionCreate Linear issue with flake rate and deadlineLinearLinear
  • OutputPost quarantine summary to SlackSlack

What it does

Uses Datadog CI Visibility flaky-test metrics rather than a single failure to decide what to quarantine. Any test whose flake rate over the trailing window crosses your threshold gets quarantined and an owner gets a data-backed ticket.

When to use it

Use this when single-failure triggers are too noisy and you want to quarantine only persistently unreliable tests, with the evidence (flake percentage, run count) attached so owners can't dispute it.

How it works

  1. 1A scheduled trigger runs every morning.
  2. 2The workflow queries Datadog CI Visibility for tests with flakiness rate above the configured threshold over the last 7 days.
  3. 3A logic step filters out tests already quarantined to avoid duplicate tickets.
  4. 4For each remaining offender it opens a GitHub PR adding the quarantine annotation.
  5. 5It creates a Linear issue per test, embedding the flake rate, sample count, and a link to the Datadog flaky-test view, with a re-enable deadline.
  6. 6A summary of all quarantined tests is posted to the team's engineering Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect GitHubRepos, issues, pull requests, actions.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.