ENGINEERING
Nightly Flaky Confirmation via Targeted Re-runs
Each night it pulls recently failed tests from the CI history in BigQuery, re-runs each one in isolation several times via a shell job.
How it runs
The automated pipeline, trigger to output.
- TriggerNightly schedule
- ActionQuery BigQuery for today's failed testsBigQuery
- ActionRe-run each suspect in isolation N timesShell
- LogicConfirm flaky only on mixed outcomes
- ActionFile ClickUp item with pass ratioClickUp
- OutputWrite classification back to BigQueryBigQuery
What it does
It separates genuinely flaky tests from tests that fail for real reasons by re-running suspects in isolation. Overnight it reads the day's failed tests from the CI results table in BigQuery, executes each one repeatedly in a clean shell environment, and only files tech debt for tests whose pass/fail outcome is inconsistent.
When to use it
Use it when you want high-confidence flaky classification before quarantining, and you store CI results in a warehouse. The isolated re-runs eliminate false positives from ordering or environment coupling.
How it works
- 1A nightly schedule triggers the workflow.
- 2It queries BigQuery for tests that failed in the last 24 hours.
- 3A shell step re-runs each suspect test N times in isolation and records outcomes.
- 4A logic step marks a test flaky only if results are mixed across runs.
- 5Confirmed-flaky tests get a ClickUp tech-debt item with the observed pass ratio.
- 6It writes the classification back to BigQuery for trend tracking.
Set it up
What you configure once, before turning it on.
- 1Connect BigQueryDatasets, queries, schemas.
- 2Connect ShellRun sandboxed commands inside the workspace.
- 3Connect ClickUpDocs + tasks + chats in one workspace.
- 4Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 5Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 6Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Engineering workflows
Upgrade Impact Router to Module Code Owners
Maps a dependency-bump PR's affected modules to their CODEOWNERS, then DMs each owner on Slack with only the changelog slice that touches code they own.
Re-Voice IVR Prompts on Phone-Tree Config Merge
When a phone-tree config change merges in GitHub, regenerates the ElevenLabs audio for any prompt whose script changed in the diff and opens a follow-up PR adding the new audio…
Agent reviews model-license fit and suggests compliant swaps on the PR
When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.
Scan for deprecated endpoints and email consumers a weekly sunset countdown
On a weekly schedule, scans the OpenAPI spec for endpoints marked deprecated with a sunset date, and emails each consuming team a countdown of how many days remain before removal.
Publish a versioned API changelog to Confluence on each release tag
On a new semver release tag, gathers the contract changes since the last release and writes a clean.
Gate breaking API PRs behind downstream consumer acknowledgement
When a PR introduces a breaking contract change, comments the impact summary back on the PR, applies a blocking label.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
