DATA OPS

Agent-driven root-cause triage for late BigQuery partitions

When a partition lands late, an agent investigates the likely cause by querying load-job history and upstream source freshness.

CategoryData Ops
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerWebhook delivers breach eventHTTP webhook
  • ActionFetch table load-job history and errorsGoogle BigQueryBigQuery
  • ActionCheck upstream source table freshnessGoogle BigQueryBigQuery
  • LogicAgent drafts ranked root-cause hypothesis
  • ActionFile GitHub issue with diagnosisGitHubGitHub
  • OutputOpen Slack triage thread linking the issueSlack

What it does

Instead of just flagging lateness, this workflow reasons about why. On a breach, an agent pulls the table's recent load-job errors and durations, checks whether upstream source tables were themselves late, and weighs the evidence to produce a ranked root-cause hypothesis (e.g. upstream delay vs failed load vs schema drift) with a concrete next step. It then opens a GitHub issue and starts a Slack triage thread.

When to use it

Use it when the same handful of pipelines break repeatedly and your team wastes the first 20 minutes of every incident re-investigating. Lets the agent do the first-pass diagnosis.

How it works

  1. 1A webhook trigger delivers the breach event with the affected table.
  2. 2A BigQuery action fetches recent load-job history, errors, and durations for the table.
  3. 3A BigQuery action checks freshness of the declared upstream source tables.
  4. 4An agent reasons over the evidence and drafts a ranked root-cause hypothesis with a recommended action.
  5. 5A GitHub action files an issue with the diagnosis attached.
  6. 6A Slack message opens a triage thread linking the issue.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HTTP webhookTrigger any URL on agent actions.
  2. 2
    Connect BigQueryDatasets, queries, schemas.
  3. 3
    Connect GitHubRepos, issues, pull requests, actions.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.