AI AGENTS
Slack Command Eval -> GitLab Swap Merge Request
Lets an engineer drop a HuggingFace model id in Slack, runs your fixed eval against the incumbent on demand, and if the challenger wins, opens a GitLab merge request to swap it.
How it runs
The automated pipeline, trigger to output.
- TriggerSlack message with model idSlack
- ActionValidate id, fetch model card + licenseHugging Face
- LogicInvalid or incompatible? Reply and stop
- ActionRun fixed eval, post scorecard to threadShell
- LogicChallenger wins by margin?
- OutputOpen GitLab swap merge requestGitLab
What it does
Makes evaluating a candidate model a single Slack message. An engineer pastes a HuggingFace model id into a channel; the agent validates it, runs your frozen eval against the current incumbent, replies with the head-to-head scorecard in thread, and — only on a clear win — opens a GitLab merge request that swaps the model id in config.
When to use it
Use it when candidates surface organically (a teammate spots a release) and you want a fast, low-ceremony path from "should we try this?" to a reviewable swap. It keeps humans driving while automating the tedious eval-and-MR steps.
How it works
- 1A Slack message containing a model id triggers the workflow.
- 2The agent validates the id and fetches the HuggingFace model card and license.
- 3A branch stops early on an invalid id or incompatible license, replying in thread.
- 4It runs the fixed eval on the challenger and the incumbent and posts the scorecard back to the Slack thread.
- 5A branch checks whether the challenger beats the incumbent by the configured margin.
- 6On a win it opens a GitLab merge request editing the model config and linking the scorecard.
Set it up
What you configure once, before turning it on.
- 1Connect SlackChannels, DMs, threads, mentions.
- 2Connect Hugging FaceModels, datasets, spaces — the open-source hub.
- 3Connect ShellRun sandboxed commands inside the workspace.
- 4Connect GitLabRepos, MRs, pipelines, registry.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More AI Agents workflows
Custom Metrics Cardinality Spike Pager
A webhook from a Datadog monitor fires when custom-metric cardinality jumps; an agent pinpoints the offending metric and tag, estimates the added cost.
Sentry-to-Confluence Runbook Updater
When a Sentry issue is resolved, the agent finds the matching Confluence runbook page and proposes an inline update with the verified fix.
Stale Doc-PR Chaser for Runbook Gaps
On a daily schedule the agent finds runbook doc PRs that were opened from resolved incidents but never reviewed, summarizes what each one fixes.
Resolved Incident to Public Troubleshooting Doc
For customer-facing errors resolved in Sentry, the agent drafts a sanitized troubleshooting entry and opens a PR to your ReadMe documentation.
On-Call Runbook Gap Closer: Resolved Sentry Issues to Doc PRs
An agent reads each newly resolved Sentry issue, compares the actual fix against your existing runbook, and opens a GitHub PR adding the missing remediation steps.
Weekly On-Call Doc-Gap Digest
Each week the agent reviews every Sentry issue resolved in the last 7 days, ranks the ones whose runbook coverage is missing or thin.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
