AI & RAG

Spike Triage Router with Runbook Answer or PagerDuty Escalation

On a Datadog alert, retrieves matching runbook guidance and decides whether the on-call can self-resolve in Slack or whether the spike warrants an immediate PagerDuty page…

CategoryAI & RAG
Enginesim
Difficultyadvanced
Triggerevent
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDatadog monitor crosses thresholdDatadogDatadog
  • ActionFetch metric, severity tags, and breach durationDatadogDatadog
  • ActionRetrieve best-matching runbook passagesPostgreSQLPostgres
  • LogicScore severity and coverage to choose route
  • ActionOpen PagerDuty incident with cited context (severe path)PagerDutyPagerDuty
  • OutputPost self-resolve answer to Slack (known-issue path)Slack

What it does

Adds a decision layer on top of runbook retrieval. When a monitor fires, it grades the spike's severity and how well the knowledge base covers it, then routes accordingly — a calm, cited fix suggestion in Slack for known issues, or a PagerDuty incident enriched with the runbook context for severe or unrecognized spikes.

When to use it

Use it when not every alert deserves a page, but the ones that do need full context immediately. Ideal for teams drowning in low-signal alerts who still want hard spikes escalated with the relevant runbook already attached.

How it works

  1. 1A Datadog monitor crosses threshold and triggers the flow.
  2. 2The flow fetches the metric, its severity tags, and recent breach duration from Datadog.
  3. 3It retrieves the best-matching runbook passages from the Postgres vector store.
  4. 4A logic step scores severity and retrieval confidence to choose a route.
  5. 5Known, low-severity spikes get a cited self-resolve message posted to Slack.
  6. 6Severe or low-coverage spikes open a PagerDuty incident with the runbook context and Slack link embedded.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect OpenAIModels, embeddings, files.
  4. 4
    Connect PagerDutyIncidents, on-call, escalations.
  5. 5
    Connect SlackChannels, DMs, threads, mentions.
  6. 6
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  7. 7
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  8. 8
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.