ENGINEERING

Datadog BigQuery anomaly to GitLab tuning issue with captured plan

Triggered by a Datadog anomaly monitor on BigQuery slot or runtime metrics, identifies the specific query behind the anomaly window.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerevent
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDatadog anomaly monitor firesDatadogDatadog
  • ActionFind heaviest jobs in anomaly windowGoogle BigQueryBigQuery
  • LogicConfirm regression and map dataset to team
  • ActionCapture query plan and cost deltaGoogle BigQueryBigQuery
  • OutputOpen assigned GitLab tuning issueGitLabGitLab

What it does

Instead of reading job history on a fixed cadence, this lets Datadog's anomaly detection decide when something is off, then does the forensics: it correlates the anomalous time window to the offending query, pulls its plan and cost delta, and files a GitLab tuning issue routed to the team that owns the dataset.

When to use it

Use when you already run BigQuery metrics through Datadog and would rather act on statistical anomalies than static thresholds. Good for catching unusual regressions that a fixed cost threshold would miss while still avoiding alert noise.

How it works

  1. 1A Datadog anomaly monitor on slot-ms or runtime fires a webhook with the anomaly window.
  2. 2BigQuery job history is queried for the heaviest jobs inside that window to identify the offender.
  3. 3A logic step confirms the regression and maps the dataset to its owning team.
  4. 4The query plan and cost delta are captured into a tuning report.
  5. 5A GitLab issue is opened, labeled and assigned to the owning team as the output.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DatadogMetrics, traces, log search.
  2. 2
    Connect BigQueryDatasets, queries, schemas.
  3. 3
    Connect GitLabRepos, MRs, pipelines, registry.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.