AI AGENTS

Custom Metrics Cardinality Spike Pager

A webhook from a Datadog monitor fires when custom-metric cardinality jumps; an agent pinpoints the offending metric and tag, estimates the added cost.

CategoryAI Agents
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDatadog cardinality monitor webhookHTTP webhook
  • ActionIdentify offending metric and tagDatadogDatadog
  • LogicSustained spike above cost threshold?
  • ActionAgent estimates added cost and causeOpenAI
  • OutputPage owning team via PagerDutyPagerDutyPagerDuty

What it does

Reacts in near real time to runaway custom-metric cardinality, the silent driver of surprise Datadog bills. When a cardinality monitor trips, an agent identifies which metric and which high-cardinality tag exploded, estimates the incremental cost rate, and pages the responsible team so a bad deploy gets caught before it runs all month.

When to use it

Use it when a single careless tag (like a user ID or request ID) can balloon custom-metric counts and you need to catch it within minutes, not at the next billing cycle. Best for teams with active deploys touching instrumentation.

How it works

  1. 1A Datadog cardinality monitor sends a webhook when custom-metric volume spikes.
  2. 2The agent queries Datadog to identify the specific metric and the tag key driving the cardinality blowup.
  3. 3A logic step confirms the spike is sustained and above the cost-impact threshold, filtering out brief blips.
  4. 4The agent estimates the added cost rate and likely cause.
  5. 5It triggers a PagerDuty incident routed to the metric's owning team with the diagnosis attached.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Connect HTTP webhookTrigger any URL on agent actions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.