ENGINEERING

Nightly Flaky Confirmation via Targeted Re-runs

Each night it pulls recently failed tests from the CI history in BigQuery, re-runs each one in isolation several times via a shell job.

CategoryEngineering

Enginesim

Difficultyadvanced

Triggerschedule

Steps6

Setup~25 min

How it runs

The automated pipeline, trigger to output.

TriggerNightly schedule
ActionQuery BigQuery for today's failed testsBigQuery
ActionRe-run each suspect in isolation N timesShell
LogicConfirm flaky only on mixed outcomes
ActionFile ClickUp item with pass ratioClickUp
OutputWrite classification back to BigQueryBigQuery

What it does

It separates genuinely flaky tests from tests that fail for real reasons by re-running suspects in isolation. Overnight it reads the day's failed tests from the CI results table in BigQuery, executes each one repeatedly in a clean shell environment, and only files tech debt for tests whose pass/fail outcome is inconsistent.

When to use it

Use it when you want high-confidence flaky classification before quarantining, and you store CI results in a warehouse. The isolated re-runs eliminate false positives from ordering or environment coupling.

How it works

1A nightly schedule triggers the workflow.
2It queries BigQuery for tests that failed in the last 24 hours.
3A shell step re-runs each suspect test N times in isolation and records outcomes.
4A logic step marks a test flaky only if results are mixed across runs.
5Confirmed-flaky tests get a ClickUp tech-debt item with the observed pass ratio.
6It writes the classification back to BigQuery for trend tracking.

Set it up

What you configure once, before turning it on.

1
Connect BigQueryDatasets, queries, schemas.
2
Connect ShellRun sandboxed commands inside the workspace.
3
Connect ClickUpDocs + tasks + chats in one workspace.
4
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
5
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
6
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More Engineering workflows

Upgrade Impact Router to Module Code Owners

Maps a dependency-bump PR's affected modules to their CODEOWNERS, then DMs each owner on Slack with only the changelog slice that touches code they own.

Re-Voice IVR Prompts on Phone-Tree Config Merge

When a phone-tree config change merges in GitHub, regenerates the ElevenLabs audio for any prompt whose script changed in the diff and opens a follow-up PR adding the new audio…

Agent reviews model-license fit and suggests compliant swaps on the PR

When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.

Scan for deprecated endpoints and email consumers a weekly sunset countdown

On a weekly schedule, scans the OpenAPI spec for endpoints marked deprecated with a sunset date, and emails each consuming team a countdown of how many days remain before removal.

Publish a versioned API changelog to Confluence on each release tag

On a new semver release tag, gathers the contract changes since the last release and writes a clean.

Gate breaking API PRs behind downstream consumer acknowledgement

When a PR introduces a breaking contract change, comments the impact summary back on the PR, applies a blocking label.

Browse all Engineering →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Marketing

Content Marketing Agency

SEO, blogs, social, and reporting on autopilot.

E-commerce

E-commerce Operator

Listings, support, inventory, and ads — running 24/7.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →