DATA OPS

Daily BigQuery New-Column PII Scanner

Each morning, finds columns added to your BigQuery warehouse in the last 24 hours, classifies each for likely PII.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily 7am schedule
  • ActionQuery INFORMATION_SCHEMA for columns added in last 24hGoogle BigQueryBigQuery
  • ActionPull de-duplicated value sample per new columnGoogle BigQueryBigQuery
  • ActionClassify each column for PII type and confidenceOpenAI
  • LogicKeep only medium+ confidence hits
  • OutputPost triage list to governance Slack channelSlack

What it does

Scans BigQuery's INFORMATION_SCHEMA for columns created in the last 24 hours, runs each unclassified column through a PII classifier (name, column type, and a small value sample), and delivers a ranked triage list to the data governance Slack channel.

When to use it

Use it when product teams ship schema changes faster than governance can review them, and sensitive fields (emails, SSNs, phone numbers) quietly land in the warehouse without a sensitivity label or masking policy.

How it works

  1. 1A daily schedule fires the scan at 7am.
  2. 2Query INFORMATION_SCHEMA.COLUMNS for columns with a creation timestamp inside the last 24 hours, excluding ones already carrying a policy tag.
  3. 3For each new column, pull a small de-duplicated value sample.
  4. 4An LLM classifier scores each column for PII type and confidence using name, type, and sample.
  5. 5A filter keeps only medium-and-above confidence hits.
  6. 6Post a formatted triage list to Slack with column path, PII type, confidence, and a suggested action.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.