DEVOPS

Scan test history in BigQuery and open a PR that skips chronic flakes

Runs nightly over your test-results warehouse in BigQuery, flags tests whose pass rate dips below a threshold.

CategoryDevOps
Enginesim
Difficultyadvanced
Triggerschedule
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNightly schedule fires
  • ActionQuery rolling pass rate per test in BigQueryGoogle BigQueryBigQuery
  • LogicKeep tests below pass-rate threshold with enough samples
  • ActionOpen GitHub PR adding quarantine annotationsGitHubGitHub
  • OutputReturn pull request URL

What it does

This scheduled job queries a BigQuery table of historical test outcomes, computes each test's pass rate over a rolling window, and identifies chronic offenders that fail intermittently across many runs. For tests that cross the instability threshold it opens a GitHub pull request adding a skip/quarantine annotation, so the flake stops blocking unrelated merges while it waits for a real fix.

When to use it

Use it when you already export CI results to a warehouse and want data-driven quarantine decisions rather than reacting to single failures. Best for large suites where one bad test can stall the whole team.

How it works

  1. 1A nightly schedule fires the workflow.
  2. 2A BigQuery query aggregates pass rate and run count per test over the last N days.
  3. 3A filter keeps tests below the pass-rate threshold with enough samples to be statistically meaningful.
  4. 4The workflow generates the quarantine annotation edits for each offending test file.
  5. 5It opens a GitHub PR with the changes, listing each test's pass rate and run count in the description.
  6. 6The PR URL is returned as output for the on-call engineer to review.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect GitHubRepos, issues, pull requests, actions.
  3. 3
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  4. 4
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  5. 5
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.