DATA OPS

BigQuery Scheduled-Query Retry with Backoff Self-Heal

When a BigQuery scheduled query fails, this workflow inspects the error, retries transient failures with exponential backoff.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerevent
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled query run finishes FAILEDGoogle BigQueryBigQuery
  • LogicClassify error: transient vs permanent
  • ActionRe-run transfer config with backoffGoogle BigQueryBigQuery
  • LogicRetry succeeded or budget exhausted?
  • ActionPost recovery note on successDiscordDiscord
  • OutputOpen Trello incident card on failureTrelloTrello

What it does

Catches a failed BigQuery scheduled-query run, classifies the failure, and automatically re-runs transient failures up to three times with growing wait intervals. Permanent failures (bad SQL, missing table, permission denied) skip retries and go straight to an incident card so engineers do not waste a retry budget on something that cannot self-heal.

When to use it

Use it for production scheduled queries that occasionally hit quota limits, slot contention, or brief source-table locks — failures that almost always succeed on a second attempt. It removes the 2 a.m. pager noise for blips while still surfacing the real breakages.

How it works

  1. 1A BigQuery transfer-run completion event fires when a scheduled query ends in FAILED state.
  2. 2A logic branch reads the error class: transient (rate-limit, backendError, resourcesExceeded) vs permanent.
  3. 3For transient errors, the workflow triggers a manual re-run of the transfer config, then waits and re-checks status, escalating the backoff on each loop up to three tries.
  4. 4If a retry succeeds, it posts a short recovery note to Discord and stops.
  5. 5If retries are exhausted or the error was permanent, it opens a Trello incident card with the query name, error text, and run history.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect DiscordCommunity channels + voice + bots.
  3. 3
    Connect TrelloKanban boards for everything.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.