agent hive

AI & RAG

Grade runbook answers and escalate weak citations

After the RAG assistant answers a remediation question, an LLM judge grades the answer's grounding and citation quality, logs the score.

CategoryAI & RAG
Enginesim
Difficultyadvanced
Triggerwebhook
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerWebhook fires with answer, question, and retrieved sourcesHTTP webhook
  • ActionLLM judge scores faithfulness and citation accuracyOpenAI
  • ActionLog scores and rationale to Postgres evaluation tablePostgreSQLPostgres
  • LogicBranch when grounding score falls below threshold
  • OutputEscalate weak answers to Slack review channelSlack

What it does

Adds a quality gate behind your on-call RAG assistant. Every answer is scored by an LLM judge for whether its claims are actually supported by the cited postmortems, then logged for trend analysis — and any poorly grounded answer is flagged for human review before it misleads a responder.

When to use it

Use it when you need confidence that the assistant is grounded, not hallucinating remediation steps under pressure. Essential before you let an RAG bot influence real incident response.

How it works

  1. 1A webhook fires when the assistant emits an answer, carrying the question, answer, and retrieved sources.
  2. 2An LLM judge scores faithfulness, citation accuracy, and completeness against the source text.
  3. 3Scores and rationale are written to a Postgres evaluation table for dashboards and trends.
  4. 4If the grounding score falls below threshold, the workflow branches to escalation.
  5. 5Low-scoring answers are posted to a Slack review channel tagging the on-call docs owner.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HTTP webhookTrigger any URL on agent actions.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect PostgresAny Postgres URL — query, write, migrate.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.