ENGINEERING

Nightly flaky-test scanner from GitHub Actions history

Each night, scans recent GitHub Actions runs to find tests that pass and fail nondeterministically on the same commit.

CategoryEngineering
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNightly schedule fires
  • ActionFetch recent GitHub Actions runs and test artifactsGitHubGitHub
  • LogicScore tests with mixed pass/fail on the same commit
  • ActionCreate or update a Linear issue per flaky testLinearLinear
  • OutputPost new-flake summary to SlackSlack

What it does

Finds the tests that are quietly eroding trust in your CI by failing intermittently. It pulls the last N days of GitHub Actions runs, groups results by test identity across commits, and flags any test that has both passes and failures on the same SHA. Each confirmed flake becomes a tracked Linear issue so it never gets lost in the noise.

When to use it

Run it on any repo where 'just re-run the job' has become a habit. It is the right fit when you want a standing, deduplicated backlog of flaky tests instead of ad-hoc Slack complaints.

How it works

  1. 1A nightly schedule fires the workflow.
  2. 2It fetches recent workflow runs and their JUnit/test artifacts from GitHub.
  3. 3A logic step computes a per-test flake score: tests with mixed pass/fail on identical commits above a threshold are kept.
  4. 4For each flake it checks GitHub for an existing tracking label to avoid duplicates.
  5. 5It creates or updates a Linear issue with the failure rate, owning team, and links to the offending runs.
  6. 6It posts a one-line summary of new flakes to the team Slack channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect GitHubRepos, issues, pull requests, actions.
  2. 2
    Connect LinearIssues, projects, cycles, triage.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.