AI AGENTS

New Release Eval -> Snowflake Scorecard History

On each new HuggingFace release in the tracked family, runs your fixed eval against the incumbent and writes a structured scorecard row to Snowflake.

CategoryAI Agents

Enginesim

Difficultyadvanced

Triggerschedule

Steps6

Setup~25 min

How it runs

The automated pipeline, trigger to output.

TriggerSchedule polls for new releases
ActionList new HuggingFace revisionsHugging Face
LogicKeep revisions not yet in Snowflake
ActionRun fixed eval, normalize scorecardShell
ActionWrite scorecard row to SnowflakeSnowflake
OutputSlack note when swap threshold crossedSlack

What it does

Builds the long-term record behind your model decisions. Every time a new model appears in the family you watch, the workflow benchmarks it against the current incumbent on a frozen eval and appends a fully structured scorecard to a Snowflake table — model id, revision, every metric, cost, and the swap verdict — so you can audit and trend model quality over time.

When to use it

Use it when you need a defensible, queryable history of model evaluations for dashboards, audits, or trend analysis, rather than one-off swap decisions. Pairs well with a BI layer reading from the same table.

How it works

1A schedule polls HuggingFace for new releases in the tracked org or collection.
2A filter keeps only genuinely new revisions not yet recorded in Snowflake.
3The agent runs the fixed eval on the new model and the incumbent.
4It normalizes results into a flat scorecard with metrics, cost, latency, and a swap-recommended flag.
5It writes the row to the Snowflake scorecard table for history and BI.
6It posts a short Slack note linking the new row when a challenger crosses the swap threshold.

Set it up

What you configure once, before turning it on.

1
Connect Hugging FaceModels, datasets, spaces — the open-source hub.
2
Connect ShellRun sandboxed commands inside the workspace.
3
Connect SnowflakeWarehouses, queries, shares.
4
Connect SlackChannels, DMs, threads, mentions.
5
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
6
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
7
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More AI Agents workflows

Custom Metrics Cardinality Spike Pager

A webhook from a Datadog monitor fires when custom-metric cardinality jumps; an agent pinpoints the offending metric and tag, estimates the added cost.

Sentry-to-Confluence Runbook Updater

When a Sentry issue is resolved, the agent finds the matching Confluence runbook page and proposes an inline update with the verified fix.

Stale Doc-PR Chaser for Runbook Gaps

On a daily schedule the agent finds runbook doc PRs that were opened from resolved incidents but never reviewed, summarizes what each one fixes.

Resolved Incident to Public Troubleshooting Doc

For customer-facing errors resolved in Sentry, the agent drafts a sanitized troubleshooting entry and opens a PR to your ReadMe documentation.

On-Call Runbook Gap Closer: Resolved Sentry Issues to Doc PRs

An agent reads each newly resolved Sentry issue, compares the actual fix against your existing runbook, and opens a GitHub PR adding the missing remediation steps.

Weekly On-Call Doc-Gap Digest

Each week the agent reviews every Sentry issue resolved in the last 7 days, ranks the ones whose runbook coverage is missing or thin.

Browse all AI Agents →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Media

YouTube Studio

Scripts, edits, thumbnails, and scheduling — every week.

Finance

Research & Trading Desk

Governance-first research, execution, and risk — every trade on the audit trail.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →