FINANCE

Per-Customer Cost Spike Detector to PagerDuty

Watches a Datadog cost-attribution monitor and, when one customer's infrastructure spend spikes versus its 7-day baseline, opens a PagerDuty incident with the offending account…

CategoryFinance
Enginesim
Difficultyintermediate
Triggerevent
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDatadog cost-anomaly monitor webhookDatadogDatadog
  • LogicParse account + compare spike to 7-day baseline
  • LogicSuppress sub-threshold noise
  • ActionOpen PagerDuty incident scaled to dollar impactPagerDutyPagerDuty
  • OutputAttach account + driver context to incidentPagerDutyPagerDuty

What it does

It catches runaway cost at the account level in near-real-time. When Datadog signals that a single customer's compute or data usage has jumped well above its recent baseline, this workflow pages the on-call finance-ops owner with the specific account and driver, so a misconfigured customer or abuse case doesn't run for days unnoticed.

When to use it

Use it when a single customer can spike your bill — heavy data ingestion, runaway queries, or a free-tier account abusing resources. Pairs well with the daily COGS allocator by catching anomalies between nightly runs.

How it works

  1. 1A Datadog monitor webhook fires when a per-customer cost metric breaches its anomaly band.
  2. 2The workflow parses the payload to extract the customer ID, the metric, and the spike magnitude.
  3. 3A logic step compares the spike against the customer's 7-day baseline and suppresses noise below a multiplier threshold.
  4. 4For real spikes, it opens a PagerDuty incident tagged with the account and severity scaled to the dollar impact.
  5. 5It posts the same context as a threaded note for the incident responder.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect PagerDutyIncidents, on-call, escalations.
  3. 3
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  4. 4
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  5. 5
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.