ENGINEERING

Flaky Test Rate Monitor with Datadog Threshold

Runs on a schedule, queries Datadog CI Visibility for each test's flaky rate over the last 7 days.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule
  • ActionQuery Datadog for per-test flaky ratesDatadogDatadog
  • LogicKeep tests above the flaky-rate threshold
  • ActionOpen or update ClickUp tech-debt itemClickUpClickUp
  • LogicBranch on severe vs moderate flakiness
  • OutputRaise PagerDuty incident for severe offendersPagerDutyPagerDuty

What it does

It measures how flaky each test actually is instead of reacting to single failures. On a daily schedule it asks Datadog CI Visibility for every test's flaky rate over a rolling window, and escalates only the tests that breach a configurable threshold so the worst offenders get attention first.

When to use it

Use it when single-failure alerts create noise and you want to prioritize quarantine work by real impact. It is ideal for teams already sending CI test results to Datadog.

How it works

  1. 1A daily schedule triggers the workflow.
  2. 2It queries Datadog CI Visibility for per-test flaky rate and failure count over the trailing 7 days.
  3. 3A logic step ranks tests and keeps any whose flaky rate exceeds the threshold (for example 5 percent).
  4. 4For each breaching test it opens or updates a ClickUp tech-debt item with the current rate.
  5. 5If the rate is severe it raises a PagerDuty incident routed to the owning service.
  6. 6It posts a ranked summary so the team sees the top flaky tests at a glance.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect ClickUpDocs + tasks + chats in one workspace.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.