ENGINEERING

Escalate to PagerDuty when Datadog multi-window burn rate exhausts budget

Evaluates Datadog SLO burn rate across both fast and slow windows on a schedule, and when both windows agree the budget is being exhausted it opens a PagerDuty incident…

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerschedule
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerSchedule fires every few minutes
  • ActionQuery Datadog SLO burn rate (fast + slow windows)DatadogDatadog
  • LogicRequire both windows over threshold
  • ActionOpen PagerDuty incident for the servicePagerDutyPagerDuty
  • OutputNotify on-call Slack channel with incident linkSlack

What it does

This workflow runs the classic multi-window, multi-burn-rate SLO evaluation against Datadog on a fixed cadence. It only escalates when both the short window and the long window confirm sustained budget burn, which suppresses the false alarms that single-window alerts produce. On confirmation it opens a PagerDuty incident and pings the on-call Slack channel.

When to use it

Use this when you want SLO-driven paging rather than threshold-on-a-graph paging, and your SLOs live in Datadog. It is ideal for services where noisy single-window alerts have caused alert fatigue.

How it works

  1. 1A schedule trigger fires every few minutes.
  2. 2The workflow queries the Datadog SLO API for the service's burn rate over the fast window and the slow window.
  3. 3A logic step requires both windows to exceed their respective burn thresholds before proceeding, otherwise it exits quietly.
  4. 4When both agree, it opens a PagerDuty incident tagged with the service and remaining budget.
  5. 5It posts the incident link and burn details to the on-call Slack channel for visibility.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.