SUMMARIZATION

Honeycomb Error-Spike Digest with PagerDuty Escalation

Polls Honeycomb on a short interval for trace error-rate spikes, summarizes the affected traces with OpenAI, posts a digest to Slack.

CategorySummarization
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled poll every few minutes
  • ActionFetch error-rate-by-operation vs rolling baselineHoneycomb
  • ActionSummarize failing operations and blast radiusOpenAI
  • LogicBranch on critical vs moderate severity
  • OutputPost digest to Slack and page PagerDuty if criticalPagerDutyPagerDuty

What it does

Watches Honeycomb for sudden increases in span error rate across services. When errors climb, OpenAI summarizes which operations are failing and the probable blast radius, then routes the result by severity: a Slack digest for moderate spikes, a PagerDuty incident for critical ones.

When to use it

Use it when raw alerting is too noisy but you still need fast escalation for genuine incidents. The summary turns a wall of failing spans into one readable paragraph, and the severity branch keeps pages reserved for what actually warrants waking someone.

How it works

  1. 1A scheduled poll runs every few minutes.
  2. 2Honeycomb returns current error-rate-by-operation versus the rolling baseline.
  3. 3OpenAI summarizes the spike: failing operations, error count, and likely user impact.
  4. 4A logic branch checks severity against the critical threshold.
  5. 5Moderate spikes post a digest to Slack; critical spikes also open a PagerDuty incident with the summary attached.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HoneycombDistributed traces and queries.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Connect PagerDutyIncidents, on-call, escalations.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.