AI AGENTS

Telemetry Cost-Spike Triage from PagerDuty Incident

When a telemetry-budget PagerDuty incident fires, an agent correlates the spike to the responsible Datadog metric or Honeycomb dataset and posts a root-cause triage…

CategoryAI Agents
Enginepaperclip
Difficultyadvanced
Triggerevent
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerPagerDuty telemetry-cost incident firesPagerDutyPagerDuty
  • ActionRead incident start time and query contextPagerDutyPagerDuty
  • ActionQuery Datadog and Honeycomb around the spike windowDatadogDatadog
  • LogicPick most likely culprit and draft drop rule
  • OutputAppend root-cause note and proposed rule to incidentPagerDutyPagerDuty

What it does

When an observability-cost budget alarm escalates to PagerDuty, this agent does the first triage for the on-call engineer. It looks at the moment the spike began, finds the Datadog metric or Honeycomb dataset whose cardinality or volume jumped, and writes a plain-language root cause plus a candidate drop or aggregate rule, all attached to the incident before a human even opens their laptop.

When to use it

Use it when telemetry cost spikes page someone and the on-call wastes the first twenty minutes figuring out which service shipped a noisy new tag. It compresses triage into a single incident note.

How it works

  1. 1A PagerDuty incident with a telemetry-cost service tag triggers the run.
  2. 2The agent reads the incident's start time and any included query context.
  3. 3It queries both Datadog and Honeycomb around that window to locate the metric or dataset with the abnormal cardinality or volume jump.
  4. 4A logic step picks the single most likely culprit and drafts a targeted drop or aggregate recommendation.
  5. 5It appends the root-cause summary and proposed rule as a note on the PagerDuty incident.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect PagerDutyIncidents, on-call, escalations.
  2. 2
    Connect DatadogMetrics, traces, log search.
  3. 3
    Connect HoneycombDistributed traces and queries.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.