DATA OPS

Escalate repeat BigQuery slot offenders to PagerDuty after nudges are ignored

Tracks daily slot-contention nudges per author and, when the same author tops the contention ranking three days running, escalates to the on-call data platform owner…

CategoryData Ops
Enginesim
Difficultyadvanced
Triggerschedule
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule after ranking computed
  • ActionQuery JOBS for today's top slot consumersGoogle BigQueryBigQuery
  • ActionUpdate per-author flag streaks in state tablePostgreSQLPostgres
  • LogicBranch when streak reaches three days
  • ActionTrigger PagerDuty incident for on-call ownerPagerDutyPagerDuty
  • OutputPost escalation heads-up to SlackSlack

What it does

It closes the loop on friendly nudges. The workflow keeps a running count of how many consecutive days each author has been flagged as a top slot consumer, and when someone ignores the gentle Slack reminders and stays at the top for three straight days, it escalates to the on-call platform owner so a human intervenes.

When to use it

Use it when self-service nudges aren't enough and a small number of authors keep saturating the reservation. It ensures persistent contention becomes a tracked operational item rather than a recurring annoyance the team learns to ignore.

How it works

  1. 1A daily schedule fires after the contention ranking is computed.
  2. 2Query `INFORMATION_SCHEMA.JOBS` to identify today's top slot consumers.
  3. 3Read each author's recent flag history from a state table in Postgres and update streak counts.
  4. 4Branch: if an author's consecutive-flag streak reaches three, escalate; otherwise just persist the updated state.
  5. 5Trigger a PagerDuty incident for the on-call owner with the author, three-day slot history, and worst queries.
  6. 6Post a heads-up in the escalation Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.