AI AGENTS
Custom Metrics Cardinality Spike Pager
A webhook from a Datadog monitor fires when custom-metric cardinality jumps; an agent pinpoints the offending metric and tag, estimates the added cost.
How it runs
The automated pipeline, trigger to output.
- TriggerDatadog cardinality monitor webhookHTTP webhook
- ActionIdentify offending metric and tagDatadog
- LogicSustained spike above cost threshold?
- ActionAgent estimates added cost and causeOpenAI
- OutputPage owning team via PagerDutyPagerDuty
What it does
Reacts in near real time to runaway custom-metric cardinality, the silent driver of surprise Datadog bills. When a cardinality monitor trips, an agent identifies which metric and which high-cardinality tag exploded, estimates the incremental cost rate, and pages the responsible team so a bad deploy gets caught before it runs all month.
When to use it
Use it when a single careless tag (like a user ID or request ID) can balloon custom-metric counts and you need to catch it within minutes, not at the next billing cycle. Best for teams with active deploys touching instrumentation.
How it works
- 1A Datadog cardinality monitor sends a webhook when custom-metric volume spikes.
- 2The agent queries Datadog to identify the specific metric and the tag key driving the cardinality blowup.
- 3A logic step confirms the spike is sustained and above the cost-impact threshold, filtering out brief blips.
- 4The agent estimates the added cost rate and likely cause.
- 5It triggers a PagerDuty incident routed to the metric's owning team with the diagnosis attached.
Set it up
What you configure once, before turning it on.
- 1Connect DatadogMetrics, traces, log search.
- 2Connect OpenAIModels, embeddings, files.
- 3Connect PagerDutyIncidents, on-call, escalations.
- 4Connect HTTP webhookTrigger any URL on agent actions.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More AI Agents workflows
Sentry-to-Confluence Runbook Updater
When a Sentry issue is resolved, the agent finds the matching Confluence runbook page and proposes an inline update with the verified fix.
Stale Doc-PR Chaser for Runbook Gaps
On a daily schedule the agent finds runbook doc PRs that were opened from resolved incidents but never reviewed, summarizes what each one fixes.
Resolved Incident to Public Troubleshooting Doc
For customer-facing errors resolved in Sentry, the agent drafts a sanitized troubleshooting entry and opens a PR to your ReadMe documentation.
On-Call Runbook Gap Closer: Resolved Sentry Issues to Doc PRs
An agent reads each newly resolved Sentry issue, compares the actual fix against your existing runbook, and opens a GitHub PR adding the missing remediation steps.
Weekly On-Call Doc-Gap Digest
Each week the agent reviews every Sentry issue resolved in the last 7 days, ranks the ones whose runbook coverage is missing or thin.
Datadog Bill Spike Attribution Agent
When a daily Datadog cost check detects a spend jump, an agent attributes the increase to the specific services and metric types driving it and posts a ranked breakdown to Slack.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
