ENGINEERING

Lock-Contention Watchdog with PagerDuty Escalation

Polls pg_stat_activity every minute for queries blocked on locks beyond a duration threshold, posts the blocking chain to Slack.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerschedule
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerOne-minute schedule
  • ActionRead activity and lock graphPostgreSQLPostgres
  • LogicFind blockers past threshold and SLA
  • OutputPost blocking chain to SlackSlack
  • OutputEscalate sustained blocker to PagerDutyPagerDutyPagerDuty

What it does

It catches lock pileups before they take down throughput. Every minute it reads `pg_stat_activity` and `pg_locks` to build the blocking-chain graph, identifies any query waiting on a lock longer than the threshold, and surfaces the head blocker. A short wait posts a Slack heads-up with the blocking PID, query text, and wait duration; a sustained blocker past the SLA escalates to PagerDuty so on-call can decide whether to terminate it.

When to use it

Use this when long-running transactions or migrations occasionally block writes and the first symptom today is a flood of timeouts. It gives you the blocking chain and an actionable PID instead of guesswork.

How it works

  1. 1A one-minute schedule fires.
  2. 2Query `pg_stat_activity` and `pg_locks` for the blocking chain.
  3. 3A logic step finds blockers past the wait threshold and SLA.
  4. 4Post the blocking chain and culprit PID to Slack.
  5. 5If the blocker exceeds the SLA, page on-call via PagerDuty.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect PostgresAny Postgres URL — query, write, migrate.
  2. 2
    Connect SlackChannels, DMs, threads, mentions.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.