DEVOPS
Agent investigates a cost spike across billing and deploys, then writes a root-cause report
On demand or on a spike alert, an agent correlates a BigQuery cost jump with recent GitHub deploys and Datadog usage metrics for the owning service.
How it runs
The automated pipeline, trigger to output.
- TriggerSpike event or manual run with service + date
- ActionPull cost breakdown for the spike windowBigQuery
- ActionGather recent deploys and PRs for the serviceGitHub
- ActionCorrelate with usage and saturation metricsDatadog
- LogicRank root-cause hypotheses by evidence
- OutputPost root-cause report to SlackSlack
What it does
Launches an investigating agent that takes a flagged cost spike, gathers evidence from billing data, recent code deploys, and infrastructure metrics, then reasons about the most likely cause (a new deploy, a traffic surge, a misconfigured resource) and produces a written root-cause report attributing the spend to the owning team.
When to use it
Use it when raw cost alerts are not enough and you want a first-pass investigation done automatically before a human picks it up. Ideal for FinOps or platform teams who spend hours manually correlating spend with what changed.
How it works
- 1The agent is triggered by a spike event or a manual run with a service name and date.
- 2It queries BigQuery for the service's cost breakdown and the spike window.
- 3It pulls recent GitHub deploys and merged PRs for that service's repo around the spike time.
- 4It queries Datadog for request volume, instance counts, and resource saturation over the same window.
- 5It synthesizes the evidence into a ranked root-cause hypothesis and a report.
- 6It posts the summary plus a confidence level to the owning team's Slack channel.
Set it up
What you configure once, before turning it on.
- 1Connect BigQueryDatasets, queries, schemas.
- 2Connect GitHubRepos, issues, pull requests, actions.
- 3Connect DatadogMetrics, traces, log search.
- 4Connect SlackChannels, DMs, threads, mentions.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More DevOps workflows
Slack-approved pause for idle Hugging Face Spaces
On a daily scan it finds idle paid Spaces and posts an interactive Slack approval; on approve it pauses the Space and logs the decision to a GitHub issue audit trail.
Block costly Hugging Face Space hardware upgrades in PR review
When a pull request changes a Space's hardware config, it estimates the new monthly cost and posts a GitHub PR comment that flags upgrades crossing a budget ceiling.
Hugging Face Spaces idle-runtime sweep with auto-pause
On a schedule, scans all Hugging Face Spaces for ones running idle past a threshold, pauses them to stop billing, and posts a Slack summary with the estimated monthly savings.
Open a Zoom war-room from a Datadog multi-alert storm
When a Datadog monitor crosses a critical threshold, this workflow dedupes against active incidents, and only for a genuinely new outage it creates a Zoom bridge.
Auto-spin a Zoom war-room when PagerDuty hits SEV-1
When a PagerDuty incident escalates to a critical severity, this workflow creates a dedicated Zoom meeting and posts the bridge link to the incident's Slack channel so responders…
Spin up a war-room on demand from a Slack slash command
When an engineer runs a Slack command, this workflow creates a Zoom bridge, opens a tracking Sentry-linked incident, files a Linear issue for follow-up.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
