DATA OPS

Hybrid regex + LLM PII classifier for Snowflake columns

Runs cheap regex screening over sampled Snowflake column values and escalates ambiguous hits to an LLM for category and confidence.

CategoryData Ops
Enginesim
Difficultyadvanced
Triggerschedule
Steps7
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled classifier run
  • ActionSample candidate Snowflake columnsSnowflakeSnowflake
  • LogicRegex screen; isolate ambiguous samples
  • ActionLLM adjudicates gray-area samplesOpenAI
  • LogicGrade by confidence: high/medium/low
  • ActionQuarantine high-confidence tablesSnowflakeSnowflake
  • OutputFile graded Linear ticketLinearLinear

What it does

Samples values from new or recently changed Snowflake columns and screens them with fast regex rules for structured PII like card numbers, SSNs, and emails. Columns that partially match or look like free-text names get escalated to an LLM that returns a PII category, confidence score, and one-line rationale. High-confidence findings quarantine the table; medium ones open a Linear ticket without locking; low ones are dropped.

When to use it

Use it when pure regex throws too many false positives on messy columns (free-text notes, mixed addresses) and you want an LLM to adjudicate only the gray-area cases without paying to classify every value.

How it works

  1. 1A schedule starts the scan.
  2. 2Sample values from candidate Snowflake columns.
  3. 3Run regex screening; route clean hits and misses directly.
  4. 4Send only ambiguous samples to the LLM for category and confidence.
  5. 5Branch on the combined confidence: high quarantines, medium tickets, low drops.
  6. 6Revoke SELECT on high-confidence tables and file a graded Linear ticket.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect SnowflakeWarehouses, queries, shares.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.