ENGINEERING

Burn-Rate Root-Cause Triage with Honeycomb

When error budget burn exceeds threshold, an agent pulls the matching Honeycomb traces, identifies which endpoint or release is driving the burn.

CategoryEngineering
Enginepaperclip
Difficultyadvanced
Triggerschedule
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled burn-rate check
  • ActionRead SLO burn rateDatadogDatadog
  • LogicProceed only above triage threshold
  • ActionQuery traces by endpoint and releaseHoneycomb
  • OutputFile triage summary in LinearLinearLinear

What it does

Knowing the budget is burning is only half the battle; this workflow answers why. When Datadog shows an elevated burn rate, it queries Honeycomb for the traces in that window, has an agent attribute the burn to the top offending endpoints or a recent deploy, and files a triage writeup so an engineer starts with a hypothesis instead of a blank dashboard.

When to use it

Use it on services with rich tracing where the cause of a burn is rarely obvious. It shortens mean-time-to-understanding by doing the first pass of correlation automatically.

How it works

  1. 1A schedule checks the Datadog SLO burn rate.
  2. 2A logic step proceeds only when burn exceeds the triage threshold, capturing the burn window.
  3. 3The agent queries Honeycomb for error and latency traces inside that window, grouped by endpoint and release version.
  4. 4It reasons over the results to rank the likely burn drivers and drafts a triage summary with supporting trace counts.
  5. 5It opens a Linear issue containing the summary, the suspected cause, and Honeycomb query links for the engineer to confirm.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect HoneycombDistributed traces and queries.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.