IT OPS

Unmute Datadog Monitors When the Maintenance Window Ends and Verify Health

At the end of an Outlook maintenance event, this workflow cancels the Datadog downtime, waits for monitors to report.

CategoryIT Ops
Enginesim
Difficultyintermediate
Triggerevent
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerOutlook maintenance event endsOutlook
  • ActionCancel Datadog downtime for the windowDatadogDatadog
  • LogicPoll monitor status; branch on clean vs still-firing
  • ActionPage on-call if a monitor is still redPagerDutyPagerDuty
  • OutputPost window-closed confirmation to SlackSlack

What it does

Closes out a maintenance window: it cancels the Datadog downtime for the scoped monitors, checks their post-window status, and either confirms a clean exit or pages on-call if a monitor came back red.

When to use it

Use it as the bookend to a window-start muter. It guarantees monitors get un-muted on time and catches the case where maintenance left a service unhealthy.

How it works

  1. 1The Outlook maintenance event reaches its end time and fires the trigger.
  2. 2The flow reads the downtime ID recorded for that window and cancels the matching Datadog downtime.
  3. 3It polls the scoped monitors' current status for a short settling period.
  4. 4A branch evaluates the results: all OK versus one or more still alerting.
  5. 5If clean, Slack posts a window-closed confirmation to the on-call channel.
  6. 6If any monitor is still firing, PagerDuty triggers an incident with the window name and the offending monitors attached.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect OutlookMail, calendar, contacts.
  2. 2
    Connect DatadogMetrics, traces, log search.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Connect PagerDutyIncidents, on-call, escalations.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.