ENGINEERING

Weekly Flaky-Test Trend Report from BigQuery

On a weekly schedule, queries your CI history warehouse in BigQuery for the top flaky specs, ranks them by failure rate and developer hours lost to retries.

CategoryEngineering
Enginesim
Difficultybeginner
Triggerschedule
Steps5
Setup~5 min

How it runs

The automated pipeline, trigger to output.

  • TriggerWeekly Monday schedule
  • ActionQuery 30-day flakiness aggregates in BigQueryGoogle BigQueryBigQuery
  • LogicFilter above threshold and rank top 20
  • ActionFormat leaderboard with week-over-week deltas
  • OutputDeliver leaderboard to engineering-leads SlackSlack

What it does

Produces a weekly flakiness leaderboard from warehoused pipeline data: which tests fail most, their pass/fail ratio, the suites they live in, and an estimate of retry minutes burned. It gives leads a data-backed view instead of anecdotes about "that one annoying test."

When to use it

Use it when your CI events already land in BigQuery and you want a recurring management-facing signal to prioritize test-stability work in sprint planning.

How it works

  1. 1A Monday-morning schedule trigger fires.
  2. 2A BigQuery step runs an aggregation over the last 30 days of pipeline runs, computing per-test failure rate, run count, and retry-time estimate.
  3. 3A logic step keeps only tests above the flakiness threshold and sorts them into a top-20 ranking.
  4. 4The flow formats the ranking into a readable table with deltas versus the prior week.
  5. 5A Slack message delivers the leaderboard to the engineering-leads channel with a note on the worst regressions.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect SlackChannels, DMs, threads, mentions.
  3. 3
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  4. 4
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  5. 5
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.