DEVOPS
Agent-Driven Flaky-Test Investigation and Draft Fix
On a quarantined test, a CEO-driven agent reads the test source and recent diffs, diagnoses the likely cause of nondeterminism, opens a draft fix PR.
How it runs
The automated pipeline, trigger to output.
- TriggerWebhook fires on new quarantineHTTP webhook
- ActionFetch test source and recent commitsGitHub
- LogicAgent diagnoses likely cause of nondeterminismOpenAI
- ActionOpen a draft fix PR with the proposed changeGitHub
- OutputFile a GitLab issue with the diagnosisGitLab
What it does
Goes beyond skipping the test: an agent investigates why it's flaky. It pulls the test code and the commits that touched it, reasons about common flake causes like unawaited async, shared state, or time and order dependence, and proposes a concrete starting fix so the owner isn't starting from zero.
When to use it
Use it when quarantine alone leaves a backlog nobody has time to investigate. The agent does the first-pass diagnosis and drafts a candidate fix, leaving a human to review and merge.
How it works
- 1A webhook fires when a test is added to quarantine.
- 2The agent fetches the test source and the recent commits that modified it from GitHub.
- 3It reasons over the code to identify the most likely source of nondeterminism and drafts a candidate change.
- 4It opens a draft fix PR with the proposed change and an explanation.
- 5It files a GitLab issue summarizing the diagnosis and linking the draft PR for the owner to review.
Set it up
What you configure once, before turning it on.
- 1Connect GitHubRepos, issues, pull requests, actions.
- 2Connect GitLabRepos, MRs, pipelines, registry.
- 3Connect OpenAIModels, embeddings, files.
- 4Connect HTTP webhookTrigger any URL on agent actions.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More DevOps workflows
Hugging Face Spaces idle-runtime sweep with auto-pause
On a schedule, scans all Hugging Face Spaces for ones running idle past a threshold, pauses them to stop billing, and posts a Slack summary with the estimated monthly savings.
Slack-approved pause for idle Hugging Face Spaces
On a daily scan it finds idle paid Spaces and posts an interactive Slack approval; on approve it pauses the Space and logs the decision to a GitHub issue audit trail.
Generate a weekly de-flake report and assign Linear cleanup tickets
On a weekly schedule, aggregates the current quarantine manifest and recent flake history, builds a prioritized report.
Block costly Hugging Face Space hardware upgrades in PR review
When a pull request changes a Space's hardware config, it estimates the new monthly cost and posts a GitHub PR comment that flags upgrades crossing a budget ceiling.
Auto-release tests from quarantine once they prove stable
Triggered by a webhook from a nightly stability runner, checks whether quarantined tests have passed enough consecutive runs, removes the stable ones from quarantine in GitHub.
Quarantine a test on demand from a PR comment command
Triggered when an engineer comments a quarantine command on a pull request, validates the test name, commits the quarantine change to that PR branch, opens a tracking issue.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
