DATA OPS

BigQuery Scheduled-Query Failure Detector with PagerDuty Escalation

Polls the BigQuery scheduled-transfer run history every 15 minutes, finds runs that errored or were cancelled.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerEvery 15 minutes
  • ActionList recent scheduled-query runsGoogle BigQueryBigQuery
  • LogicKeep only new FAILED/CANCELLED runs
  • ActionOpen PagerDuty incident per failurePagerDutyPagerDuty
  • OutputPost incident summary to SlackSlack

What it does

It watches BigQuery's Data Transfer Service run history for scheduled queries that finished in a FAILED or CANCELLED state and turns each new failure into a PagerDuty incident, so a silently broken nightly job pages someone instead of going unnoticed for a day.

When to use it

Use it when you run business-critical scheduled queries (revenue rollups, partner exports, ML feature tables) where a silent failure means downstream dashboards or pipelines serve stale or missing data. Best for teams that already route data on-call through PagerDuty.

How it works

  1. 1A schedule fires every 15 minutes.
  2. 2A BigQuery action lists transfer-config runs whose `endTime` falls in the window and reads each run's state and error status.
  3. 3A logic step keeps only runs in FAILED or CANCELLED state that haven't already been alerted.
  4. 4For each remaining failure, a PagerDuty action opens an incident titled with the query display name and the BigQuery error message.
  5. 5A Slack output drops a short summary into the data-eng channel linking the incident and the run in the BigQuery console.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.