DEVOPS
Auto-advance or halt a deploy from Datadog canary metrics
After a deploy stage goes live, this polls Datadog for error-rate and latency over a watch window.
How it runs
The automated pipeline, trigger to output.
- TriggerCanary soak window timer
- ActionQuery Datadog SLO metricsDatadog
- LogicCompare metrics to thresholds
- ActionPost green or red verdict to threadDiscord
- OutputMention on-call on breachDiscord
What it does
Replaces the manual 'is the canary healthy?' check with a data-driven verdict. It queries Datadog for the key SLO metrics during a soak window and tells the war-room whether the stage is safe to advance.
When to use it
Use this when each rollout stage needs an objective health gate based on real telemetry, not vibes. Great for teams that already define error-rate and p95 latency thresholds and want them enforced automatically before promotion.
How it works
- 1A schedule trigger runs at the start of each canary soak window referenced from the active deploy.
- 2An action queries Datadog for error rate, p95 latency, and 5xx counts on the new revision.
- 3A logic step compares each metric against its threshold to compute a pass or fail verdict.
- 4If all metrics pass, the output posts a green checkpoint to the Discord thread clearing the next stage.
- 5If any metric breaches, the output posts a red alert in the thread @-mentioning on-call with the offending metric and a rollback prompt.
Set it up
What you configure once, before turning it on.
- 1Connect DatadogMetrics, traces, log search.
- 2Connect DiscordCommunity channels + voice + bots.
- 3Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 4Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 5Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More DevOps workflows
Block costly Hugging Face Space hardware upgrades in PR review
When a pull request changes a Space's hardware config, it estimates the new monthly cost and posts a GitHub PR comment that flags upgrades crossing a budget ceiling.
Auto-spin a Zoom war-room when PagerDuty hits SEV-1
When a PagerDuty incident escalates to a critical severity, this workflow creates a dedicated Zoom meeting and posts the bridge link to the incident's Slack channel so responders…
Page on-call when a Hugging Face Space build is stuck or errored
Polls Hugging Face Space runtime status on a schedule and opens a PagerDuty incident when a Space sits in a build or error state past a deadline, with a Slack heads-up.
Slack-approved pause for idle Hugging Face Spaces
On a daily scan it finds idle paid Spaces and posts an interactive Slack approval; on approve it pauses the Space and logs the decision to a GitHub issue audit trail.
Hugging Face Spaces idle-runtime sweep with auto-pause
On a schedule, scans all Hugging Face Spaces for ones running idle past a threshold, pauses them to stop billing, and posts a Slack summary with the estimated monthly savings.
Open a Zoom war-room from a Datadog multi-alert storm
When a Datadog monitor crosses a critical threshold, this workflow dedupes against active incidents, and only for a genuinely new outage it creates a Zoom bridge.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
