ENGINEERING

Sustained Budget Breach to PagerDuty

On a schedule, scan Honeycomb for query shapes that have stayed over their latency budget for a sustained period and open a PagerDuty incident with the worst offender and its…

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • Trigger15-minute schedule
  • ActionFetch p95 for governed query shapesHoneycomb
  • LogicKeep sustained over-budget breaches
  • LogicMap offender to owning service
  • OutputOpen PagerDuty incidentPagerDutyPagerDuty

What it does

Not every slow query traces back to a single PR. This workflow continuously checks every governed query shape against its budget, and when one stays over the envelope long enough to count as a real problem rather than a blip, it opens a PagerDuty incident routed to the owning service with the query, duration, and trend.

When to use it

Use it for the queries that matter most to availability, where a sustained latency breach should page someone rather than wait for the next code review. Pair it with the per-shape ownership map.

How it works

  1. 1A schedule triggers the scan every 15 minutes.
  2. 2A Honeycomb action fetches current p95 for all governed query shapes.
  3. 3A logic step keeps only shapes over budget and confirms the breach has persisted across consecutive intervals.
  4. 4A logic step maps the worst sustained offender to its owning service and severity.
  5. 5A PagerDuty action opens or updates an incident with the query, the breach duration, and a Honeycomb trace link.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HoneycombDistributed traces and queries.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  4. 4
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  5. 5
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.