DATA OPS

Tiered dbt Freshness Escalation to PagerDuty

Escalates BigQuery model freshness breaches by severity tier — paging PagerDuty for revenue-critical models past a hard SLA while routing minor lateness to Slack only.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerEvery 5 minutes (schedule)
  • ActionRead freshness age, tier, and SLA per modelGoogle BigQueryBigQuery
  • LogicSplit into tier-1 hard breaches vs. minor lateness
  • ActionOpen/update PagerDuty incident for tier-1 breachesPagerDutyPagerDuty
  • OutputPost low-noise summary of minor lateness to SlackSlack

What it does

Applies tiered urgency to freshness breaches. Tier-1 models (revenue, billing) that blow past a hard SLA trigger a real PagerDuty incident with on-call paging; lower-tier models or minor lateness get a quieter Slack heads-up. This stops alert fatigue while guaranteeing the truly critical tables wake someone up.

When to use it

Use this when you have a clear severity hierarchy across your models and need genuine paging (not just a chat message) for the data products that money or compliance depend on.

How it works

  1. 1A schedule fires every 5 minutes.
  2. 2BigQuery returns freshness ages and each model's configured tier and SLA.
  3. 3A logic step splits breaches into tier-1-hard-breach versus everything-else.
  4. 4Tier-1 hard breaches open or update a PagerDuty incident with the model, age, and runbook link.
  5. 5All other breaches post a low-noise summary to the data-ops Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.