AI & RAG

Weekly Runbook Coverage Gap Report for Datadog Spikes

Weekly, it reviews the spikes that the knowledge base answered poorly, identifies which Datadog monitors lack good runbook coverage.

CategoryAI & RAG
Enginesim
Difficultyadvanced
Triggerschedule
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerWeekly schedule starts the coverage review
  • ActionRead past week's alert and retrieval-confidence logsPostgreSQLPostgres
  • LogicAggregate and rank low-coverage spikes by monitor
  • ActionCheck Confluence for existing pages, propose new titlesConfluenceConfluence
  • ActionDraft runbook outlines for the top gapsOpenAI
  • OutputPost prioritized gap report to SlackSlack

What it does

Closes the loop on the knowledge base itself. It looks back at the week's alerts and the retrieval confidence each one produced, finds the monitors that repeatedly returned weak or no runbook matches, and turns that into a ranked list of documentation gaps with suggested page titles.

When to use it

Use it when on-call keeps hitting alerts the runbooks do not cover and you want a data-driven backlog of what to write. It makes 'we should document that' a concrete weekly task instead of a vague intention.

How it works

  1. 1A weekly schedule starts the review.
  2. 2The flow reads the past week's alert and retrieval-confidence logs from Postgres.
  3. 3It aggregates low-confidence and no-match spikes by monitor and metric.
  4. 4A logic step ranks gaps by alert frequency and absence of runbook coverage.
  5. 5An LLM proposes runbook titles and outlines for the top gaps, checking Confluence to avoid duplicating existing pages.
  6. 6The prioritized gap report is posted to the team's Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect PostgresAny Postgres URL — query, write, migrate.
  2. 2
    Connect ConfluenceSpaces, pages, blueprints.
  3. 3
    Connect OpenAIModels, embeddings, files.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.