DEVOPS

Page on-call when Datadog CI build-duration anomaly maps to a cache miss

Triggers on a Datadog CI pipeline duration anomaly, confirms it correlates with a build-cache miss rate spike.

CategoryDevOps
Enginesim
Difficultyintermediate
Triggerwebhook
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDatadog CI duration anomaly webhookDatadogDatadog
  • ActionQuery cache miss-rate metric for same windowDatadogDatadog
  • LogicDuration jump correlated with cache miss spike?
  • ActionAssemble incident payload with both signals
  • OutputOpen PagerDuty incident for CI on-callPagerDutyPagerDuty

What it does

When Datadog detects an anomalous jump in CI pipeline duration, this flow checks whether the slowdown lines up with a spike in cache miss rate. If both are true, it opens a PagerDuty incident pre-filled with the affected pipeline, the duration delta, and the cache metrics so on-call starts with context instead of a blank page.

When to use it

Use it for teams that already monitor CI in Datadog and want to separate real cache regressions from ordinary build-time noise. Duration alone is noisy; gating on a correlated cache-miss spike removes the false pages from flaky tests or runner contention.

How it works

  1. 1A Datadog monitor webhook fires on a CI pipeline duration anomaly.
  2. 2The flow queries Datadog for the cache miss-rate metric over the same interval.
  3. 3A branch checks whether miss rate is elevated alongside the duration jump.
  4. 4If correlated, it builds an incident payload with both signals and the pipeline link.
  5. 5It opens a PagerDuty incident and routes it to the CI on-call escalation.
  6. 6If not correlated, it logs the event and exits quietly.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  4. 4
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  5. 5
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.