DEVOPS

Flaky Terraform Apply Auto-Retry with PagerDuty Escalation

Watches for failed Terraform apply runs, automatically retries transient infra failures with backoff, and escalates to PagerDuty only when retries are exhausted on a real error.

CategoryDevOps

Enginesim

Difficultyadvanced

Triggerwebhook

Steps5

Setup~25 min

How it runs

The automated pipeline, trigger to output.

TriggerWebhook: terraform apply failedHTTP webhook
LogicClassify error: transient vs real
ActionRetry apply with backoffShell
LogicCheck retry result and budget
OutputOpen PagerDuty incident on real failurePagerDuty

What it does

When a Terraform apply fails, this workflow inspects the error, distinguishes transient flakiness (rate limits, eventual-consistency races, lock contention) from genuine config errors, retries the transient ones with exponential backoff, and pages on-call via PagerDuty only when the failure is real or retries run out.

When to use it

Use it when your apply pipeline fails intermittently on provider rate limits or resource-not-yet-ready errors, and you want self-healing retries instead of a human re-running the job at 3am — while still guaranteeing a page for failures that actually need eyes.

How it works

1A webhook trigger receives the apply-failed event from your CI pipeline with the error log attached.
2A logic step classifies the error against a transient-pattern list (429s, lock timeouts, dependency-not-ready).
3If transient and retry budget remains, a shell action re-runs `terraform apply` after a backoff delay.
4A logic step checks the retry result: success ends the run cleanly.
5On a non-transient error or an exhausted retry budget, an output step opens a PagerDuty incident with the failing resource, error class, and run link.

Set it up

What you configure once, before turning it on.

1
Connect HTTP webhookTrigger any URL on agent actions.
2
Connect ShellRun sandboxed commands inside the workspace.
3
Connect PagerDutyIncidents, on-call, escalations.
4
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
5
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
6
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More DevOps workflows

Slack-approved pause for idle Hugging Face Spaces

On a daily scan it finds idle paid Spaces and posts an interactive Slack approval; on approve it pauses the Space and logs the decision to a GitHub issue audit trail.

Block costly Hugging Face Space hardware upgrades in PR review

When a pull request changes a Space's hardware config, it estimates the new monthly cost and posts a GitHub PR comment that flags upgrades crossing a budget ceiling.

Hugging Face Spaces idle-runtime sweep with auto-pause

On a schedule, scans all Hugging Face Spaces for ones running idle past a threshold, pauses them to stop billing, and posts a Slack summary with the estimated monthly savings.

Open a Zoom war-room from a Datadog multi-alert storm

When a Datadog monitor crosses a critical threshold, this workflow dedupes against active incidents, and only for a genuinely new outage it creates a Zoom bridge.

Auto-spin a Zoom war-room when PagerDuty hits SEV-1

When a PagerDuty incident escalates to a critical severity, this workflow creates a dedicated Zoom meeting and posts the bridge link to the incident's Slack channel so responders…

Spin up a war-room on demand from a Slack slash command

When an engineer runs a Slack command, this workflow creates a Zoom bridge, opens a tracking Sentry-linked incident, files a Linear issue for follow-up.

Browse all DevOps →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Finance

Research & Trading Desk

Governance-first research, execution, and risk — every trade on the audit trail.

Operations

Internal Operations

Runbooks, on-call, vendor management — disciplined and audited.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →