AI AGENTS

Slack Command Eval -> GitLab Swap Merge Request

Lets an engineer drop a HuggingFace model id in Slack, runs your fixed eval against the incumbent on demand, and if the challenger wins, opens a GitLab merge request to swap it.

CategoryAI Agents
Enginepaperclip
Difficultyintermediate
Triggerchat
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerSlack message with model idSlack
  • ActionValidate id, fetch model card + licenseHugging FaceHugging Face
  • LogicInvalid or incompatible? Reply and stop
  • ActionRun fixed eval, post scorecard to threadShell
  • LogicChallenger wins by margin?
  • OutputOpen GitLab swap merge requestGitLabGitLab

What it does

Makes evaluating a candidate model a single Slack message. An engineer pastes a HuggingFace model id into a channel; the agent validates it, runs your frozen eval against the current incumbent, replies with the head-to-head scorecard in thread, and — only on a clear win — opens a GitLab merge request that swaps the model id in config.

When to use it

Use it when candidates surface organically (a teammate spots a release) and you want a fast, low-ceremony path from "should we try this?" to a reviewable swap. It keeps humans driving while automating the tedious eval-and-MR steps.

How it works

  1. 1A Slack message containing a model id triggers the workflow.
  2. 2The agent validates the id and fetches the HuggingFace model card and license.
  3. 3A branch stops early on an invalid id or incompatible license, replying in thread.
  4. 4It runs the fixed eval on the challenger and the incumbent and posts the scorecard back to the Slack thread.
  5. 5A branch checks whether the challenger beats the incumbent by the configured margin.
  6. 6On a win it opens a GitLab merge request editing the model config and linking the scorecard.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect SlackChannels, DMs, threads, mentions.
  2. 2
    Connect Hugging FaceModels, datasets, spaces — the open-source hub.
  3. 3
    Connect ShellRun sandboxed commands inside the workspace.
  4. 4
    Connect GitLabRepos, MRs, pipelines, registry.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.