ENGINEERING
Page On-Call When a Flaky Test Blocks the GitLab Merge Train
Monitors GitLab merge-train pipeline failures, distinguishes flaky failures from real ones via retry history.
How it runs
The automated pipeline, trigger to output.
- TriggerGitLab merge-train pipeline failsGitLab
- ActionInspect job retry history to classify the failureGitLab
- LogicBranch: intermittent and train blocked vs. exit
- ActionPage on-call owner via PagerDutyPagerDuty
- ActionOpen post-incident tracking ticketLinear
- OutputPost escalation summary to merge-train SlackSlack
What it does
A flaky test that stalls a merge train blocks everyone behind it, so this workflow treats it as an incident. When a GitLab merge-train pipeline fails, it checks the failing job's retry history to decide if the failure is intermittent. If it is and the train is blocked, it pages the on-call owner via PagerDuty for immediate quarantine and opens a Linear ticket so the fix is tracked after the fire is out.
When to use it
Use this when you run GitLab merge trains and a single flaky test can hold up the whole queue. It escalates fast-moving, high-impact flakes to a human instead of letting them sit in a ticket backlog.
How it works
- 1A GitLab webhook fires on a merge-train pipeline failure.
- 2The flow inspects the failed job's retry attempts and history to classify flaky vs. real.
- 3A branch checks whether the failure is intermittent and the train is currently blocked.
- 4If so, PagerDuty pages the on-call owner with the pipeline and job details.
- 5A Linear ticket is opened to track the post-incident fix.
- 6The merge-train channel in Slack gets the escalation summary.
Set it up
What you configure once, before turning it on.
- 1Connect GitLabRepos, MRs, pipelines, registry.
- 2Connect PagerDutyIncidents, on-call, escalations.
- 3Connect LinearIssues, projects, cycles, triage.
- 4Connect SlackChannels, DMs, threads, mentions.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Engineering workflows
Agent reviews model-license fit and suggests compliant swaps on the PR
When a PR adds a Hugging Face model, an agent reads the model card and license, judges fit against your commercial-use policy.
Block PRs that add incompatible Hugging Face model licenses
When a pull request adds or bumps a Hugging Face model dependency, it fetches the model card license, checks it against your org's allowed-license policy.
Quarterly Logging Hygiene Audit Agent
An agent-driven quarterly sweep that surveys all Axiom datasets, builds a logging-hygiene scorecard per service.
Post-Merge Log Volume Recheck After Downsampling PR
After a log-level PR merges, waits a day then re-queries Axiom to confirm the targeted stream's volume actually dropped.
Axiom Ingest Cost Spike to Linear Triage Ticket
When Axiom ingest volume spikes beyond its baseline, identifies which service caused it and files a Linear ticket with the offending log stream, sample lines, and a downsampling…
File a Linear license-review ticket for risky model adds
When a PR introduces a Hugging Face model with a non-permissive or unknown license, it opens a Linear issue assigned to the legal-review team with the model, license.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
