DATA OPS

Lineage-aware stale-table alerting (suppress downstream noise)

When multiple BigQuery tables go stale, walks the dependency graph to find the upstream root cause and alerts only on the root.

CategoryData Ops
Enginesim
Difficultyadvanced
Triggerschedule
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled freshness sweep
  • ActionFetch last-refresh per monitored tableGoogle BigQueryBigQuery
  • ActionLoad lineage parent/child edgesGoogle BigQueryBigQuery
  • LogicResolve stale roots, suppress downstream victims
  • OutputPost root cause + blast radius to SlackSlack
  • OutputPage only for tier-1 root causesPagerDutyPagerDuty

What it does

When an upstream source stalls, every table that depends on it also goes stale, producing a storm of alerts. This workflow reads your declared lineage, identifies which stale tables are root causes versus downstream victims, and alerts only on the roots, listing the affected downstream tables as context.

When to use it

Use it when your warehouse has deep dependency chains and on-call gets buried in correlated freshness alerts during a single source outage. It turns 30 alerts into one actionable root-cause page.

How it works

  1. 1A schedule triggers the freshness sweep.
  2. 2A BigQuery query returns the last-refresh time for all monitored tables.
  3. 3A query against your lineage table loads parent/child edges between those tables.
  4. 4A logic step marks each stale table, then walks edges to keep only stale tables whose parents are fresh (the true roots) and groups their stale descendants.
  5. 5A Slack message posts each root cause with its blast-radius list.
  6. 6A PagerDuty incident opens only if a root sits in the tier-1 critical set.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect SlackChannels, DMs, threads, mentions.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.