DATA OPS

BigQuery PII Column Drift Scanner with Linear Governance Review

Each day, snapshots your BigQuery schema, detects columns whose names or sampled values newly match PII patterns.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule fires the drift scan
  • ActionRead column metadata and row samples from BigQueryGoogle BigQueryBigQuery
  • LogicDiff against prior snapshot to isolate new columnsPostgreSQLPostgres
  • LogicClassify new columns against PII detector rules
  • ActionOpen Linear issue with evidence and assign reviewerLinearLinear
  • OutputWrite fresh schema snapshot back to PostgresPostgreSQLPostgres

What it does

This workflow watches your BigQuery datasets for the appearance of new columns that look like personal data — emails, phone numbers, SSNs, national IDs, addresses — and routes each finding into a Linear governance review instead of letting it slip into production unnoticed.

When to use it

Run it when engineers ship schema changes faster than your governance team can audit them, and you need a paper trail showing every sensitive field was reviewed before it spread downstream.

How it works

  1. 1A daily schedule fires the scan.
  2. 2The workflow pulls `INFORMATION_SCHEMA.COLUMNS` from BigQuery and samples a few rows per new column.
  3. 3It compares the current column set against the prior snapshot stored in Postgres to isolate only columns that appeared since the last run.
  4. 4A classifier matches each new column name and value sample against PII regex and detector rules.
  5. 5If any column scores as sensitive, a Linear issue is created with the table, column, sample evidence, and a governance reviewer assigned.
  6. 6The fresh snapshot is written back to Postgres so the next run only flags genuinely new drift.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect BigQueryDatasets, queries, schemas.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.