DATA OPS

dbt Staleness RCA Agent: Investigate Stale Marts and Draft a Root Cause

When a stale-mart alert fires, an agent traces the dbt lineage, inspects recent run logs and source lag.

CategoryData Ops
Enginepaperclip
Difficultyadvanced
Triggerwebhook
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerStale-mart alert via webhook with incident idHTTP webhook
  • ActionTrace upstream lineage and lags in SnowflakeSnowflakeSnowflake
  • LogicFind earliest broken node from run logs
  • LogicForm ranked root-cause hypothesis
  • OutputPost RCA draft to the Linear incidentLinearLinear

What it does

Takes a stale-table alert and does the first pass of investigation a human would. The agent walks the model's upstream lineage in Snowflake, reads the latest dbt run logs, checks whether the root cause is a stalled source, a failed model, or a slow run, and writes a plain-language root-cause hypothesis with the evidence it found. It posts the writeup as a comment on the related Linear incident.

When to use it

Use when freshness incidents land on-call without context and the engineer burns time reconstructing what broke. The agent hands them a head start: a likely cause and the lineage path to verify.

How it works

  1. 1A stale-mart alert arrives via HTTP webhook with the table and incident id.
  2. 2The agent queries Snowflake to trace the model's upstream lineage and load lags.
  3. 3It reviews recent dbt run logs to find the earliest broken node in the chain.
  4. 4It reasons over the evidence to form a ranked root-cause hypothesis.
  5. 5It posts the RCA draft with supporting evidence onto the Linear incident.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HTTP webhookTrigger any URL on agent actions.
  2. 2
    Connect SnowflakeWarehouses, queries, shares.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.