DATA OPS

BigQuery Cost Anomaly Root-Cause Triage Agent

When a cost anomaly webhook fires, an agent investigates the offending BigQuery query — inspecting plan, partition usage, and recent edits.

CategoryData Ops
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerCost-anomaly webhook receivedHTTP webhook
  • ActionInspect query plan, partition + cluster usageGoogle BigQueryBigQuery
  • ActionPull recent edit history for the queryGoogle BigQueryBigQuery
  • LogicReason over evidence to root-cause + rank fix
  • ActionWrite root-cause analysis to Notion triage pageNotionNotion
  • OutputNotify query owner in SlackSlack

What it does

Goes beyond detection to diagnosis. On a cost-anomaly signal, an agent pulls the offending query's execution plan, checks whether it scans unpartitioned data or missing clustering, reviews who changed it recently and how, then writes a human-readable root-cause analysis: what regressed, the likely cause, and a concrete fix (add a partition filter, materialize a CTE, narrow a SELECT *). The write-up lands in a Notion triage page and the owner gets pinged in Slack.

When to use it

Use this when raw alerts create triage toil and your data team spends mornings reverse-engineering why a query got expensive. The agent does the first-pass investigation so humans start from a hypothesis.

How it works

  1. 1A cost-anomaly webhook triggers the agent.
  2. 2The agent queries BigQuery for the job's plan, bytes scanned, and partition/cluster usage.
  3. 3It pulls recent edit history to correlate the regression with a change.
  4. 4It reasons over the evidence to produce a root cause and ranked fix recommendation.
  5. 5The analysis is written to a Notion triage page and the owner is notified in Slack.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HTTP webhookTrigger any URL on agent actions.
  2. 2
    Connect BigQueryDatasets, queries, schemas.
  3. 3
    Connect NotionPages, databases, comments.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.