ENGINEERING

Escalate Suite Instability to PagerDuty When Flakes Block Releases

On a schedule, evaluates aggregate flake load from Datadog and the open deflake backlog in Linear, and if the main pipeline's flake rate threatens releases.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerschedule
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled stability evaluation
  • ActionFetch main pipeline flake metrics from DatadogDatadogDatadog
  • ActionPull open deflake tickets and ages from LinearLinearLinear
  • LogicCompute stability score against release-risk threshold
  • ActionOpen PagerDuty incident when threshold breachedPagerDutyPagerDuty
  • OutputPost incident summary to SlackSlack

What it does

It treats systemic flakiness as an operational risk, not just a backlog item. On a schedule it pulls the main branch's recent CI flake rate from Datadog and the count of aging open deflake tickets from Linear, computes a stability score, and if the suite is unstable enough to jeopardize merges and releases it raises a PagerDuty incident so an owner is accountable in real time.

When to use it

Use it when flakes have crossed from annoyance to release blocker and you need a clear escalation trigger instead of ad-hoc complaints. It draws the line at which test instability becomes an on-call concern.

How it works

  1. 1A schedule triggers the evaluation.
  2. 2A Datadog action fetches the main pipeline's recent flake and failure-retry metrics.
  3. 3A Linear action pulls open deflake tickets and their ages.
  4. 4A logic step computes a stability score and checks it against the release-risk threshold.
  5. 5A PagerDuty action opens an incident for the test-platform on-call when the threshold is breached.
  6. 6A Slack message posts the incident summary and the top contributing flaky tests.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect LinearIssues, projects, cycles, triage.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.