ENGINEERING
Nightly Flake Scorer: Rank Tests by Failure Rate and Auto-Quarantine the Worst
Runs nightly, queries the last 30 days of CI runs from Postgres to compute a per-test flake rate.
How it runs
The automated pipeline, trigger to output.
- TriggerNightly schedule fires after CI completes
- ActionQuery 30-day pass/fail history per testPostgres
- LogicCompute flake rate; bucket stable / watchlist / quarantine
- ActionOpen skip MR for threshold breachesGitHub
- OutputWrite ranked scoreboard and watchlist to PostgresPostgres
What it does
This scheduled job computes a data-driven flake score for every test from your stored CI history, ranks them by intermittent-failure rate, and acts on the worst offenders. Tests above the quarantine threshold get an automatic skip MR; borderline ones go to a watchlist instead of being skipped prematurely.
When to use it
Use it when you already log CI results to a database and want statistics, not vibes, to decide what to quarantine. It catches slow-burn flakes that only fail a few percent of runs and never trip a single-failure alert.
How it works
- 1A nightly schedule trigger fires after the day's CI runs complete.
- 2A Postgres query aggregates pass/fail counts per test over a rolling 30-day window and computes each test's failure rate.
- 3A logic step sorts tests into three buckets: stable, watchlist, and quarantine-threshold breached.
- 4For each threshold breach, it opens a GitHub skip MR with the computed flake rate in the description.
- 5It writes the watchlist and the full ranked scoreboard back to Postgres for the next run's trend comparison.
Set it up
What you configure once, before turning it on.
- 1Connect PostgresAny Postgres URL — query, write, migrate.
- 2Connect GitHubRepos, issues, pull requests, actions.
- 3Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 4Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 5Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Engineering workflows
Gate breaking API PRs behind downstream consumer acknowledgement
When a PR introduces a breaking contract change, comments the impact summary back on the PR, applies a blocking label.
Publish a versioned API changelog to Confluence on each release tag
On a new semver release tag, gathers the contract changes since the last release and writes a clean.
Agent reviews model-license fit and suggests compliant swaps on the PR
When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.
Upgrade Impact Router to Module Code Owners
Maps a dependency-bump PR's affected modules to their CODEOWNERS, then DMs each owner on Slack with only the changelog slice that touches code they own.
Re-Voice IVR Prompts on Phone-Tree Config Merge
When a phone-tree config change merges in GitHub, regenerates the ElevenLabs audio for any prompt whose script changed in the diff and opens a follow-up PR adding the new audio…
Upstream Release to Notion Upgrade Brief
When a watched package publishes a new release, fetches the release notes, maps them to the internal modules that depend on it.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
