ENGINEERING

Honeycomb p95 Latency Spike Triage Agent

When Honeycomb fires a p95 db-query latency trigger, an agent pulls the slowest traces, runs EXPLAIN ANALYZE on the implicated SQL.

CategoryEngineering
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerHoneycomb p95 trigger webhookHoneycomb
  • ActionFetch slowest traces in windowHoneycomb
  • ActionRun EXPLAIN ANALYZE on replicaPostgreSQLPostgres
  • LogicSynthesize root-cause hypothesis
  • OutputFile GitHub optimization ticketGitHubGitHub

What it does

It turns a Honeycomb latency alert into an investigated, write-up-ready optimization ticket. When the p95 of your `db.query` spans breaches its trigger, an agent fetches the slowest matching traces, extracts the SQL and bound parameters, runs `EXPLAIN ANALYZE` against a read replica, and reasons about the likely cause — missing index, stale statistics, or a bad join order. It then drafts a GitHub issue with the trace link, the analyzed plan, and a concrete fix suggestion.

When to use it

Use this when on-call gets paged for slow DB queries but lacks time to dig into traces and plans before filing follow-up work. The agent does the first pass of triage.

How it works

  1. 1A Honeycomb trigger webhook fires on a p95 latency breach.
  2. 2Query Honeycomb for the slowest traces in the breach window.
  3. 3The agent extracts SQL and runs EXPLAIN ANALYZE on the replica.
  4. 4It synthesizes a root-cause hypothesis and proposed fix.
  5. 5Open a GitHub issue tagged `db-optimization` with trace and plan evidence.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HoneycombDistributed traces and queries.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect GitHubRepos, issues, pull requests, actions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.