ENGINEERING

Flaky-Test Trend Report from CI Warehouse

On a schedule, queries historical CI test results in BigQuery to rank the most flaky and most-quarantined tests over the quarter.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerschedule
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerMonthly schedule
  • ActionQuery CI history flake rates in BigQueryGoogle BigQueryBigQuery
  • LogicIdentify chronic repeat-quarantined tests
  • ActionFile Linear ticket per new chronic offenderLinearLinear
  • ActionPublish ranked trend report to ConfluenceConfluenceConfluence
  • OutputShare report link in SlackSlack

What it does

Turns raw CI result history into a quarterly flakiness trend report. It queries a BigQuery table of every test run, computes flake rate and quarantine frequency per test over time, identifies tests that keep getting re-quarantined, and publishes a ranked report to Confluence. New chronic offenders get a Linear ticket.

When to use it

Use it when you have CI results landing in a data warehouse and engineering leadership wants visibility into where test reliability is trending. It surfaces the repeat offenders that point-in-time quarantine workflows keep parking but never fix.

How it works

  1. 1A monthly schedule trigger starts the report.
  2. 2It runs a BigQuery query over historical CI results to compute per-test flake rate and quarantine count.
  3. 3A branch identifies tests quarantined more than N times this quarter (chronic offenders).
  4. 4For each new chronic offender, it opens a Linear ticket labeled `flaky-chronic`.
  5. 5It renders the ranked trend tables and publishes them to a Confluence page.
  6. 6It posts the report link to the engineering leadership Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect LinearIssues, projects, cycles, triage.
  3. 3
    Connect ConfluenceSpaces, pages, blueprints.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.