DEVOPS

Auto-rollback a feature flag when Honeycomb error rate spikes

Polls Honeycomb for the error rate on a newly rolled-out flag cohort and, if it breaches your threshold versus the control group, disables the flag via GitHub and pages…

CategoryDevOps
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerSchedule fires every 2 minutes during rollout window
  • ActionQuery Honeycomb for flag-on vs control error rateHoneycomb
  • LogicBranch: flag-on error rate exceeds control by delta?
  • ActionDisable flag config via GitHub commit / revert PRGitHubGitHub
  • ActionRaise high-urgency PagerDuty incidentPagerDutyPagerDuty
  • OutputPost rollback summary to Slack release channelSlack

What it does

Watches the live error rate for traffic exposed to a feature flag and pulls the flag automatically the moment it regresses against the control cohort, then escalates so a human knows within seconds.

When to use it

Run this during a progressive rollout (1% to 100%) of a risky backend change behind a flag. It gives you a deterministic safety net so a bad deploy never burns more than one polling interval of elevated errors.

How it works

  1. 1A schedule fires every two minutes during the rollout window.
  2. 2A Honeycomb query returns the error rate for the flag-on cohort and the flag-off control cohort over the last interval.
  3. 3A logic branch compares the two: if flag-on error rate exceeds control by more than the configured delta, proceed; otherwise stop.
  4. 4On breach, a GitHub action commits the flag config back to the disabled state (or opens a revert PR) to kill the rollout.
  5. 5PagerDuty raises a high-urgency incident with the offending flag, cohort sizes, and the Honeycomb query link.
  6. 6A Slack message posts the rollback summary to the release channel.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect HoneycombDistributed traces and queries.
  2. 2
    Connect GitHubRepos, issues, pull requests, actions.
  3. 3
    Connect PagerDutyIncidents, on-call, escalations.
  4. 4
    Connect SlackChannels, DMs, threads, mentions.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.