CHATBOTS

On-Call "Why Did This Fire?" Monitor Explainer

An on-call engineer asks a chatbot why a Datadog monitor alerted, and the bot replies with the triggering metric, the linked dashboard.

CategoryChatbots
Enginepaperclip
Difficultyintermediate
Triggerchat
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerEngineer messages bot with monitor ID or alert linkSlack
  • ActionFetch monitor definition, state, and breached queryDatadogDatadog
  • ActionPull linked dashboard and recent metric historyDatadogDatadog
  • ActionList merged PRs and deploy tags before the alertGitHubGitHub
  • LogicRank most likely contributing deploy
  • OutputPost threaded explanation with dashboard and commit linksSlack

What it does

Gives on-call engineers a single chat command to understand any Datadog alert. Instead of clicking through Datadog, GitHub, and deploy logs, they paste a monitor ID or alert link and get back a plain-English explanation: what crossed threshold, which dashboard shows it, and what shipped just before.

When to use it

Use it the moment a pager goes off and you need fast context. It is ideal for teams where the person on call did not write the failing service and needs orientation in seconds, not minutes.

How it works

The operator messages the bot in Slack with a monitor ID or alert URL. The agent calls the Datadog API to pull the monitor definition, its current state, and the metric query that breached. It then queries Datadog for the linked dashboard and recent metric history, and asks GitHub for merged pull requests and deploy tags in the window before the alert. The agent reasons over all of it, ranks the most likely contributing deploy, and posts a threaded reply with the metric snapshot, a dashboard deep link, and the suspect commits.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect SlackChannels, DMs, threads, mentions.
  2. 2
    Connect DatadogMetrics, traces, log search.
  3. 3
    Connect GitHubRepos, issues, pull requests, actions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.