ENGINEERING
Flaky-Test Quarantine Agent: CI Failure to Tracked Ticket + Skip MR
Watches GitHub Actions failures, uses an LLM to decide whether a failing test is genuinely flaky or a real regression.
How it runs
The automated pipeline, trigger to output.
- TriggerGitHub Actions run fails (workflow_run conclusion=failure)GitHub
- ActionFetch failing job logs and test reportGitHub
- ActionClassify each failure: flaky vs. real regressionOpenAI
- LogicKeep only flaky failures; drop regressions
- ActionOpen tracked flake ticket in LinearLinear
- OutputOpen draft skip/quarantine MR on GitHubGitHub
What it does
When a CI run fails, this agent fetches the failing test logs, classifies each failure as flaky (intermittent, environment-sensitive) or a real regression, and only quarantines the flaky ones. For each confirmed flake it files a tracked Linear ticket and opens a draft GitHub MR that skips the test, so green builds resume without burying real bugs.
When to use it
Use it when intermittent failures are eroding trust in your CI signal and engineers are blindly re-running jobs. It separates genuine flakiness from regressions automatically, so you stop hand-triaging every red build.
How it works
- 1A GitHub Actions `workflow_run` completion with `conclusion=failure` fires the trigger.
- 2The agent pulls the failing job logs and the test report from the run via the GitHub API.
- 3An OpenAI classification step labels each failing test flaky vs. regression, citing the log evidence.
- 4A logic branch drops anything classified as a regression and keeps only flaky tests.
- 5For each flake, it creates a Linear issue with the failure history and reproduction notes.
- 6It opens a draft GitHub MR adding a skip/quarantine annotation, linked to the ticket.
Set it up
What you configure once, before turning it on.
- 1Connect GitHubRepos, issues, pull requests, actions.
- 2Connect OpenAIModels, embeddings, files.
- 3Connect LinearIssues, projects, cycles, triage.
- 4Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 5Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 6Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Engineering workflows
Upgrade Impact Router to Module Code Owners
Maps a dependency-bump PR's affected modules to their CODEOWNERS, then DMs each owner on Slack with only the changelog slice that touches code they own.
Re-Voice IVR Prompts on Phone-Tree Config Merge
When a phone-tree config change merges in GitHub, regenerates the ElevenLabs audio for any prompt whose script changed in the diff and opens a follow-up PR adding the new audio…
Agent reviews model-license fit and suggests compliant swaps on the PR
When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.
Scan for deprecated endpoints and email consumers a weekly sunset countdown
On a weekly schedule, scans the OpenAPI spec for endpoints marked deprecated with a sunset date, and emails each consuming team a countdown of how many days remain before removal.
Publish a versioned API changelog to Confluence on each release tag
On a new semver release tag, gathers the contract changes since the last release and writes a clean.
Gate breaking API PRs behind downstream consumer acknowledgement
When a PR introduces a breaking contract change, comments the impact summary back on the PR, applies a blocking label.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
