DATA OPS

Freshness Recovery Auto-Resolver for Open Lag Incidents

Periodically rechecks tables that have open freshness incidents in PagerDuty, and once Snowflake confirms a table has caught up to its SLA.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled recovery sweep
  • ActionList open freshness incidents in PagerDutyPagerDutyPagerDuty
  • ActionRe-check table freshness in SnowflakeSnowflakeSnowflake
  • LogicKeep tables now within SLA
  • ActionResolve recovered incidentsPagerDutyPagerDuty
  • OutputPost recovery and downtime to SlackSlack

What it does

This closes the loop on freshness alerts. It looks at currently open PagerDuty freshness incidents, re-measures those specific tables in Snowflake, and when a table has fully recovered to within its SLA, it resolves the incident automatically and reports how long the table was stale to the team channel in Slack.

When to use it

Use it alongside any freshness alerting workflow so on-call engineers are not stuck manually resolving incidents that the pipeline already fixed itself on the next run. It also gives you accurate, automatic staleness-duration metrics per incident.

How it works

  1. 1A schedule triggers the recovery sweep.
  2. 2It lists open freshness-related incidents from PagerDuty.
  3. 3It re-queries Snowflake for the current freshness of each affected table.
  4. 4A logic step keeps only the tables now back within SLA.
  5. 5It resolves those PagerDuty incidents.
  6. 6It posts a recovery note with total downtime to Slack.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect PagerDutyIncidents, on-call, escalations.
  2. 2
    Connect SnowflakeWarehouses, queries, shares.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.