ENGINEERING

Nightly Flaky Confirmation via Targeted Re-runs

Each night it pulls recently failed tests from the CI history in BigQuery, re-runs each one in isolation several times via a shell job.

CategoryEngineering
Enginesim
Difficultyadvanced
Triggerschedule
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNightly schedule
  • ActionQuery BigQuery for today's failed testsGoogle BigQueryBigQuery
  • ActionRe-run each suspect in isolation N timesShell
  • LogicConfirm flaky only on mixed outcomes
  • ActionFile ClickUp item with pass ratioClickUpClickUp
  • OutputWrite classification back to BigQueryGoogle BigQueryBigQuery

What it does

It separates genuinely flaky tests from tests that fail for real reasons by re-running suspects in isolation. Overnight it reads the day's failed tests from the CI results table in BigQuery, executes each one repeatedly in a clean shell environment, and only files tech debt for tests whose pass/fail outcome is inconsistent.

When to use it

Use it when you want high-confidence flaky classification before quarantining, and you store CI results in a warehouse. The isolated re-runs eliminate false positives from ordering or environment coupling.

How it works

  1. 1A nightly schedule triggers the workflow.
  2. 2It queries BigQuery for tests that failed in the last 24 hours.
  3. 3A shell step re-runs each suspect test N times in isolation and records outcomes.
  4. 4A logic step marks a test flaky only if results are mixed across runs.
  5. 5Confirmed-flaky tests get a ClickUp tech-debt item with the observed pass ratio.
  6. 6It writes the classification back to BigQuery for trend tracking.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect ShellRun sandboxed commands inside the workspace.
  3. 3
    Connect ClickUpDocs + tasks + chats in one workspace.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.