DATA OPS

Daily BigQuery PII drift scan with auto-flagged masking candidates

Scans BigQuery tables every morning, uses an LLM to classify each column's sensitivity, and posts any columns that newly cross into PII territory to Slack for masking review.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule fires the PII scan
  • ActionQuery column metadata and sampled values from BigQueryGoogle BigQueryBigQuery
  • ActionClassify each column's sensitivity tier with an LLMOpenAI
  • LogicDiff against baseline, keep newly-sensitive columns
  • ActionWrite updated classifications back to BigQuery baselineGoogle BigQueryBigQuery
  • OutputPost masking candidates to Slack governance channelSlack

What it does

Every morning this workflow samples your BigQuery tables, asks an LLM to classify each column as public, internal, or PII, and compares the verdict against the last stored classification. Columns that newly became sensitive (a free-text notes field that now contains emails, a synced column that started carrying SSNs) are flagged as masking candidates and pushed to Slack so a data steward can act before the data spreads.

When to use it

Use it when upstream teams add or repurpose columns faster than your governance reviews can keep up, and you need an early-warning signal the moment a column drifts into regulated territory rather than discovering it in an audit.

How it works

  1. 1A daily schedule fires the scan.
  2. 2The workflow queries BigQuery for column metadata plus a small sampled set of values per column.
  3. 3An OpenAI call classifies each column's sensitivity tier and gives a one-line rationale.
  4. 4A logic step diffs today's classification against the stored baseline and keeps only newly-sensitive columns.
  5. 5The current classification is written back to BigQuery as the new baseline.
  6. 6Flagged columns are posted to a Slack governance channel with table, column, tier, and rationale.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.