ENGINEERING

Escalate Sustained SLO Burn to PagerDuty After Failed Gates

When a service's PR gate fails repeatedly within a window, confirms sustained Honeycomb burn and pages the on-call via PagerDuty with the burn math and links to the blocked PRs.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerwebhook
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerWebhook: PR gate recorded a failureHTTP webhook
  • LogicCount failures in window; proceed past threshold
  • ActionConfirm sustained burn via HoneycombHoneycomb
  • ActionOpen PagerDuty incident with burn math and PR linksPagerDutyPagerDuty
  • OutputRoute incident to service on-call policyPagerDutyPagerDuty

What it does

Turns repeated gate failures into a real page. When the burn-rate gate blocks merges to the same service multiple times inside a short window, this flow re-checks Honeycomb to confirm the burn is sustained (not a transient spike), then opens a PagerDuty incident addressed to that service's on-call. The incident body carries the burn multiplier, budget remaining, and links to the PRs currently being blocked so responders have full context immediately.

When to use it

Use it when a hot SLO is blocking shipping and nobody has acknowledged it. It bridges the silent gate-failure state into an actionable page, so a burning budget doesn't quietly stall a team for hours.

How it works

  1. 1A webhook fires each time the PR gate records a failure for a service.
  2. 2A logic step counts failures for that service within the window and proceeds only past the threshold.
  3. 3It re-queries Honeycomb to confirm the burn is sustained across windows.
  4. 4It opens a PagerDuty incident with the burn math and blocked-PR links.
  5. 5The incident routes to the service's on-call escalation policy.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HTTP webhookTrigger any URL on agent actions.
  2. 2
    Connect HoneycombDistributed traces and queries.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.