DATA OPS

Daily BigQuery PII drift scan with auto-flagged masking candidates

Scans BigQuery tables every morning, uses an LLM to classify each column's sensitivity, and posts any columns that newly cross into PII territory to Slack for masking review.

CategoryData Ops

Enginesim

Difficultyintermediate

Triggerschedule

Steps6

Setup~15 min

How it runs

The automated pipeline, trigger to output.

TriggerDaily schedule fires the PII scan
ActionQuery column metadata and sampled values from BigQueryBigQuery
ActionClassify each column's sensitivity tier with an LLMOpenAI
LogicDiff against baseline, keep newly-sensitive columns
ActionWrite updated classifications back to BigQuery baselineBigQuery
OutputPost masking candidates to Slack governance channelSlack

What it does

Every morning this workflow samples your BigQuery tables, asks an LLM to classify each column as public, internal, or PII, and compares the verdict against the last stored classification. Columns that newly became sensitive (a free-text notes field that now contains emails, a synced column that started carrying SSNs) are flagged as masking candidates and pushed to Slack so a data steward can act before the data spreads.

When to use it

Use it when upstream teams add or repurpose columns faster than your governance reviews can keep up, and you need an early-warning signal the moment a column drifts into regulated territory rather than discovering it in an audit.

How it works

1A daily schedule fires the scan.
2The workflow queries BigQuery for column metadata plus a small sampled set of values per column.
3An OpenAI call classifies each column's sensitivity tier and gives a one-line rationale.
4A logic step diffs today's classification against the stored baseline and keeps only newly-sensitive columns.
5The current classification is written back to BigQuery as the new baseline.
6Flagged columns are posted to a Slack governance channel with table, column, tier, and rationale.

Set it up

What you configure once, before turning it on.

1
Connect BigQueryDatasets, queries, schemas.
2
Connect OpenAIModels, embeddings, files.
3
Connect SlackChannels, DMs, threads, mentions.
4
Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
5
Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
6
Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

More Data Ops workflows

Weekly BigQuery Cost Trend Sheet and Exec Digest

Compiles week-over-week BigQuery scheduled-query cost by owner and dataset into a Google Sheet with trend columns.

Daily BigQuery Scheduled-Query Cost Attribution to Owners

Each morning, totals the prior day's on-demand bytes-billed per scheduled query, maps each query to its owner from a label, and posts a per-owner cost leaderboard to Slack.

BigQuery Per-Team Budget Breach Alert to PagerDuty

Tracks month-to-date BigQuery scheduled-query spend per team and, when a team crosses its monthly budget, pages the team's on-call in PagerDuty and snapshots the spend breakdown…

dbt source freshness watcher with severity-routed alerts

Checks Snowflake loaded-at timestamps against each dbt source's freshness SLA, then routes warnings to Slack and hard breaches to a PagerDuty incident so stale data never…

dbt orphan model detector with Linear cleanup tickets

Scans your dbt manifest for models that no other model, exposure, or BI tool consumes.

Raw Sensor Telemetry Archive to BigQuery

Captures every incoming building sensor reading via webhook, normalizes the payload into a consistent schema.

Browse all Data Ops →

Run it inside a business

This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Software

AI Tools Startup

Ship an AI tool, distribute on every channel, watch the unit economics.

Software

Agent Hive runs Agent Hive

The team that built Agent Hive, exactly as it runs today.

Marketing

Content Marketing Agency

SEO, blogs, social, and reporting on autopilot.

Browse all business templates →Solutions by industry →

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.

Join the Waitlist Browse all workflows →