ENGINEERING

Flaky-vs-Real Triage via Sentry Correlation

When a GitLab test fails, checks whether Sentry recorded a matching production error in the same code path.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerevent
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerGitLab pipeline failed webhookGitLabGitLab
  • ActionQuery Sentry for matching prod errors in same pathSentrySentry
  • LogicBranch: correlated regression vs flaky-only
  • ActionOpen PagerDuty incident for real regressionsPagerDutyPagerDuty
  • OutputLabel uncorrelated failures flaky::review in GitLabGitLabGitLab

What it does

Distinguishes flaky failures from real regressions by cross-referencing each failing test against live Sentry issues touching the same file or module. A failure with a correlated production error is treated as a real bug and paged out; a failure with no production signal is labeled flaky for quarantine review.

When to use it

Use it when blanket-quarantining failing tests risks hiding real regressions. This adds a production-evidence check before anything gets silenced.

How it works

  1. 1A GitLab pipeline-failed webhook delivers the failed test list and changed files.
  2. 2For each failure, a Sentry query looks for unresolved issues whose stack frames touch the same source path within the recent release window.
  3. 3A logic branch splits failures into "correlated with prod error" versus "no production signal."
  4. 4Correlated failures trigger a PagerDuty incident with the Sentry issue and pipeline links attached.
  5. 5Uncorrelated failures get a GitLab `flaky::review` label and a comment noting no matching Sentry activity.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitLabRepos, MRs, pipelines, registry.
  2. 2
    Connect SentryErrors, performance, releases.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.