DATA OPS
Datadog Freshness Metric Breach: Auto-Triage and Halt the Pipeline
Receives a Datadog monitor alert when a table-freshness metric breaches, classifies it as a load delay versus a hard failure.
How it runs
The automated pipeline, trigger to output.
- TriggerDatadog freshness alert webhookDatadog
- ActionConfirm current lag in SnowflakeSnowflake
- LogicSoft delay or hard breach?
- ActionPost heads-up to Slack (soft)Slack
- ActionPage on-call + pause dbt (hard)PagerDuty
- OutputRecord verdict to breach logSnowflake
What it does
Turns a raw Datadog freshness-monitor alert into a graded response. It distinguishes a brief, self-healing load delay from a genuine pipeline stall, so the team only gets paged for real breaches while soft delays are tracked quietly.
When to use it
Use it when you already emit a `table.freshness.lag_minutes` metric to Datadog and want smarter routing than a single threshold — pages for true stalls, a heads-up for delays still inside grace.
How it works
- 1Datadog monitor webhook fires on a freshness-lag breach and posts the payload.
- 2Re-query Snowflake to confirm the current lag (avoids alerting on a stale metric scrape).
- 3Logic branches: lag inside grace window and trending down means a soft delay; lag beyond hard threshold or growing means a real breach.
- 4Soft delay: post a low-priority Slack note to the data channel and exit.
- 5Hard breach: open a PagerDuty incident and pause the dependent dbt run via its API.
- 6Write the verdict and metric values to the breach log.
Set it up
What you configure once, before turning it on.
- 1Connect DatadogMetrics, traces, log search.
- 2Connect SnowflakeWarehouses, queries, shares.
- 3Connect SlackChannels, DMs, threads, mentions.
- 4Connect PagerDutyIncidents, on-call, escalations.
- 5Connect HTTP webhookTrigger any URL on agent actions.
- 6Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 7Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 8Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Data Ops workflows
Snowflake column type-drift sentinel with Linear fix ticket
Snapshots the data types of every column in your tracked Snowflake schemas on a schedule, diffs against the last snapshot.
Daily BigQuery Scheduled-Query Cost Attribution to Owners
Each morning, totals the prior day's on-demand bytes-billed per scheduled query, maps each query to its owner from a label, and posts a per-owner cost leaderboard to Slack.
BigQuery dropped/renamed column sentinel with PagerDuty incident
Detects when a column is dropped or renamed in your governed BigQuery datasets and, because that breaks downstream queries hard, pages the on-call via PagerDuty and posts…
PR-time Snowflake schema contract check on dbt model changes
When a pull request changes a dbt model, it compares the model's declared output columns against the live Snowflake table it will replace and blocks the merge with a GitHub check…
Agent-triaged warehouse drift with impact analysis and runbook update
On a webhook from your warehouse audit log, an agent investigates the changed column, traces which downstream models and dashboards depend on it.
Cross-warehouse replication schema mismatch reconciler
Compares the column shape of mirrored tables between BigQuery and Snowflake and, when a replicated table has drifted out of sync between the two, opens an Asana task for the data…
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
