DATA OPS

Datadog Freshness Metric Breach: Auto-Triage and Halt the Pipeline

Receives a Datadog monitor alert when a table-freshness metric breaches, classifies it as a load delay versus a hard failure.

CategoryData Ops
Enginesim
Difficultyadvanced
Triggerwebhook
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDatadog freshness alert webhookDatadogDatadog
  • ActionConfirm current lag in SnowflakeSnowflakeSnowflake
  • LogicSoft delay or hard breach?
  • ActionPost heads-up to Slack (soft)Slack
  • ActionPage on-call + pause dbt (hard)PagerDutyPagerDuty
  • OutputRecord verdict to breach logSnowflakeSnowflake

What it does

Turns a raw Datadog freshness-monitor alert into a graded response. It distinguishes a brief, self-healing load delay from a genuine pipeline stall, so the team only gets paged for real breaches while soft delays are tracked quietly.

When to use it

Use it when you already emit a `table.freshness.lag_minutes` metric to Datadog and want smarter routing than a single threshold — pages for true stalls, a heads-up for delays still inside grace.

How it works

  1. 1Datadog monitor webhook fires on a freshness-lag breach and posts the payload.
  2. 2Re-query Snowflake to confirm the current lag (avoids alerting on a stale metric scrape).
  3. 3Logic branches: lag inside grace window and trending down means a soft delay; lag beyond hard threshold or growing means a real breach.
  4. 4Soft delay: post a low-priority Slack note to the data channel and exit.
  5. 5Hard breach: open a PagerDuty incident and pause the dependent dbt run via its API.
  6. 6Write the verdict and metric values to the breach log.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect SnowflakeWarehouses, queries, shares.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Connect PagerDutyIncidents, on-call, escalations.
  5. 5
    Connect HTTP webhookTrigger any URL on agent actions.
  6. 6
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  7. 7
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  8. 8
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.