DATA OPS

Daily Cross-Warehouse PII Inventory Snapshot to R2

Builds a unified daily inventory of every classified PII column across both Snowflake and BigQuery, writes the versioned snapshot to R2.

CategoryData Ops
Enginesim
Difficultyadvanced
Triggerschedule
Steps7
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily scheduled snapshot
  • ActionQuery Snowflake column catalogSnowflakeSnowflake
  • ActionQuery BigQuery column catalogGoogle BigQueryBigQuery
  • ActionClassify and merge into unified inventoryOpenAI
  • ActionWrite versioned snapshot to R2CloudflareCloudflare R2
  • LogicDiff against prior snapshot for delta
  • OutputPost PII delta summary to SlackSlack

What it does

Once a day it pulls the full column catalog from both Snowflake and BigQuery, classifies columns for PII, merges them into one canonical inventory, and stores a timestamped snapshot in R2. It then compares against yesterday's snapshot and reports the net change — columns added, removed, or newly reclassified — to Slack.

When to use it

Use it when you need an auditable, point-in-time record of where PII lives across multiple warehouses for compliance reviews (SOC 2, GDPR data mapping) and want a single daily signal on how your sensitive-data footprint is drifting.

How it works

  1. 1A daily scheduled trigger kicks off the snapshot.
  2. 2Query Snowflake and BigQuery column catalogs in parallel.
  3. 3An OpenAI step classifies each column and a logic step merges both sources into one normalized inventory.
  4. 4Write the inventory as a versioned, timestamped object to R2.
  5. 5Diff today's inventory against the most recent prior snapshot in R2 to compute added, removed, and reclassified columns.
  6. 6Post the day-over-day PII delta summary to Slack.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect SnowflakeWarehouses, queries, shares.
  2. 2
    Connect BigQueryDatasets, queries, schemas.
  3. 3
    Connect OpenAIModels, embeddings, files.
  4. 4
    Connect Cloudflare R2Object storage, S3-compatible.
  5. 5
    Connect SlackChannels, DMs, threads, mentions.
  6. 6
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  7. 7
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  8. 8
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.