DATA OPS
Daily BigQuery PII drift scan with auto-flagged masking candidates
Scans BigQuery tables every morning, uses an LLM to classify each column's sensitivity, and posts any columns that newly cross into PII territory to Slack for masking review.
How it runs
The automated pipeline, trigger to output.
- TriggerDaily schedule fires the PII scan
- ActionQuery column metadata and sampled values from BigQueryBigQuery
- ActionClassify each column's sensitivity tier with an LLMOpenAI
- LogicDiff against baseline, keep newly-sensitive columns
- ActionWrite updated classifications back to BigQuery baselineBigQuery
- OutputPost masking candidates to Slack governance channelSlack
What it does
Every morning this workflow samples your BigQuery tables, asks an LLM to classify each column as public, internal, or PII, and compares the verdict against the last stored classification. Columns that newly became sensitive (a free-text notes field that now contains emails, a synced column that started carrying SSNs) are flagged as masking candidates and pushed to Slack so a data steward can act before the data spreads.
When to use it
Use it when upstream teams add or repurpose columns faster than your governance reviews can keep up, and you need an early-warning signal the moment a column drifts into regulated territory rather than discovering it in an audit.
How it works
- 1A daily schedule fires the scan.
- 2The workflow queries BigQuery for column metadata plus a small sampled set of values per column.
- 3An OpenAI call classifies each column's sensitivity tier and gives a one-line rationale.
- 4A logic step diffs today's classification against the stored baseline and keeps only newly-sensitive columns.
- 5The current classification is written back to BigQuery as the new baseline.
- 6Flagged columns are posted to a Slack governance channel with table, column, tier, and rationale.
Set it up
What you configure once, before turning it on.
- 1Connect BigQueryDatasets, queries, schemas.
- 2Connect OpenAIModels, embeddings, files.
- 3Connect SlackChannels, DMs, threads, mentions.
- 4Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 5Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 6Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Data Ops workflows
Weekly BigQuery Cost Trend Sheet and Exec Digest
Compiles week-over-week BigQuery scheduled-query cost by owner and dataset into a Google Sheet with trend columns.
Daily BigQuery Scheduled-Query Cost Attribution to Owners
Each morning, totals the prior day's on-demand bytes-billed per scheduled query, maps each query to its owner from a label, and posts a per-owner cost leaderboard to Slack.
BigQuery Per-Team Budget Breach Alert to PagerDuty
Tracks month-to-date BigQuery scheduled-query spend per team and, when a team crosses its monthly budget, pages the team's on-call in PagerDuty and snapshots the spend breakdown…
dbt source freshness watcher with severity-routed alerts
Checks Snowflake loaded-at timestamps against each dbt source's freshness SLA, then routes warnings to Slack and hard breaches to a PagerDuty incident so stale data never…
dbt orphan model detector with Linear cleanup tickets
Scans your dbt manifest for models that no other model, exposure, or BI tool consumes.
Raw Sensor Telemetry Archive to BigQuery
Captures every incoming building sensor reading via webhook, normalizes the payload into a consistent schema.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
