AI AGENTS

Nightly Health Sweep with Morning Fix-Approval Queue

On a nightly schedule an agent sweeps Datadog and PagerDuty for degraded-but-not-paging conditions, drafts a remediation for each.

CategoryAI Agents

Enginepaperclip

Difficultyadvanced

Triggerschedule

Steps6

Setup~25 min

How it runs

The automated pipeline, trigger to output.

TriggerNightly schedule fires
ActionPull warning monitors and resolved incidentsDatadog
ActionAgent drafts remediation per signal cluster
LogicRank by risk and drop self-healed items
ActionPost batched approval queue to SlackSlack
OutputExecute approved fixes and post summaryShell

What it does

Proactively catches the slow-burn problems that never page — creeping disk usage, a flapping monitor, a stale auto-resolved incident — and turns them into a tidy morning checklist. Each item comes with a proposed fix the engineer can approve or skip in one place.

When to use it

Use this to stop low-grade issues from becoming 3am pages. Best for teams who want a predictable start-of-shift ritual: review the queue, approve the safe fixes, defer the rest — instead of discovering the same warnings scattered across dashboards.

How it works

1A nightly schedule triggers the sweep.
2The agent pulls warning-level Datadog monitors and recently auto-resolved PagerDuty incidents.
3It groups related signals and drafts one proposed remediation per cluster.
4A logic step ranks items by risk and filters out anything already self-healed.
5It posts a single batched approval queue to Slack with per-item Approve / Skip controls.
6Approved items execute their shell action; the agent posts a closing summary of what ran.

Set it up

What you configure once, before turning it on.

1
Connect DatadogMetrics, traces, log search.
2
Connect PagerDutyIncidents, on-call, escalations.
3
Connect SlackChannels, DMs, threads, mentions.
4
Connect ShellRun sandboxed commands inside the workspace.
5
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
6
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
7
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More AI Agents workflows

Custom Metrics Cardinality Spike Pager

A webhook from a Datadog monitor fires when custom-metric cardinality jumps; an agent pinpoints the offending metric and tag, estimates the added cost.

Sentry-to-Confluence Runbook Updater

When a Sentry issue is resolved, the agent finds the matching Confluence runbook page and proposes an inline update with the verified fix.

Stale Doc-PR Chaser for Runbook Gaps

On a daily schedule the agent finds runbook doc PRs that were opened from resolved incidents but never reviewed, summarizes what each one fixes.

Resolved Incident to Public Troubleshooting Doc

For customer-facing errors resolved in Sentry, the agent drafts a sanitized troubleshooting entry and opens a PR to your ReadMe documentation.

On-Call Runbook Gap Closer: Resolved Sentry Issues to Doc PRs

An agent reads each newly resolved Sentry issue, compares the actual fix against your existing runbook, and opens a GitHub PR adding the missing remediation steps.

Weekly On-Call Doc-Gap Digest

Each week the agent reviews every Sentry issue resolved in the last 7 days, ranks the ones whose runbook coverage is missing or thin.

Browse all AI Agents →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Finance

Research & Trading Desk

Governance-first research, execution, and risk — every trade on the audit trail.

Operations

Internal Operations

Runbooks, on-call, vendor management — disciplined and audited.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →