DEVOPS

Agent investigates a cost spike across billing and deploys, then writes a root-cause report

On demand or on a spike alert, an agent correlates a BigQuery cost jump with recent GitHub deploys and Datadog usage metrics for the owning service.

CategoryDevOps
Enginepaperclip
Difficultyadvanced
Triggerevent
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerSpike event or manual run with service + date
  • ActionPull cost breakdown for the spike windowGoogle BigQueryBigQuery
  • ActionGather recent deploys and PRs for the serviceGitHubGitHub
  • ActionCorrelate with usage and saturation metricsDatadogDatadog
  • LogicRank root-cause hypotheses by evidence
  • OutputPost root-cause report to SlackSlack

What it does

Launches an investigating agent that takes a flagged cost spike, gathers evidence from billing data, recent code deploys, and infrastructure metrics, then reasons about the most likely cause (a new deploy, a traffic surge, a misconfigured resource) and produces a written root-cause report attributing the spend to the owning team.

When to use it

Use it when raw cost alerts are not enough and you want a first-pass investigation done automatically before a human picks it up. Ideal for FinOps or platform teams who spend hours manually correlating spend with what changed.

How it works

  1. 1The agent is triggered by a spike event or a manual run with a service name and date.
  2. 2It queries BigQuery for the service's cost breakdown and the spike window.
  3. 3It pulls recent GitHub deploys and merged PRs for that service's repo around the spike time.
  4. 4It queries Datadog for request volume, instance counts, and resource saturation over the same window.
  5. 5It synthesizes the evidence into a ranked root-cause hypothesis and a report.
  6. 6It posts the summary plus a confidence level to the owning team's Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect GitHubRepos, issues, pull requests, actions.
  3. 3
    Connect DatadogMetrics, traces, log search.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.