DATA OPS
Nightly PII sweep of an R2 bucket with a digest report
On a nightly schedule, scans all objects in a Cloudflare R2 bucket for exposed PII, tags offenders, and posts a ranked digest of the worst exposures to Slack plus a full…
How it runs
The automated pipeline, trigger to output.
- TriggerNightly schedule fires
- ActionList and fetch all objects in R2 bucketCloudflare R2
- ActionScan each object for PII type and severityOpenAI
- ActionWrite row-level findings to Postgres audit tablePostgres
- LogicRank by severity, keep top offenders
- OutputPost ranked exposure digest to SlackSlack
What it does
Walks an entire R2 bucket every night, samples and scans each object for PII, and produces a prioritized digest so the team starts each morning knowing exactly which datasets are leaking and where. It does not move files itself — it surfaces and records risk for human triage.
When to use it
Use this for buckets too large or sensitive to gate at write time, where a daily inventory of exposure is the right cadence. Ideal for analytics dumps, log archives, and shared export buckets.
How it works
- 1A nightly schedule kicks off the sweep.
- 2The flow lists all objects in the R2 bucket and iterates over them.
- 3Each object is fetched and scanned by OpenAI for PII type and severity.
- 4Findings are written row-by-row to a Postgres audit table for historical tracking.
- 5A logic step ranks objects by severity and keeps the top offenders.
- 6A formatted digest of the highest-risk objects posts to Slack for morning triage.
Set it up
What you configure once, before turning it on.
- 1Connect Cloudflare R2Object storage, S3-compatible.
- 2Connect OpenAIModels, embeddings, files.
- 3Connect PostgresAny Postgres URL — query, write, migrate.
- 4Connect SlackChannels, DMs, threads, mentions.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Data Ops workflows
Snowflake column type-drift sentinel with Linear fix ticket
Snapshots the data types of every column in your tracked Snowflake schemas on a schedule, diffs against the last snapshot.
Daily BigQuery Scheduled-Query Cost Attribution to Owners
Each morning, totals the prior day's on-demand bytes-billed per scheduled query, maps each query to its owner from a label, and posts a per-owner cost leaderboard to Slack.
BigQuery dropped/renamed column sentinel with PagerDuty incident
Detects when a column is dropped or renamed in your governed BigQuery datasets and, because that breaks downstream queries hard, pages the on-call via PagerDuty and posts…
PR-time Snowflake schema contract check on dbt model changes
When a pull request changes a dbt model, it compares the model's declared output columns against the live Snowflake table it will replace and blocks the merge with a GitHub check…
Agent-triaged warehouse drift with impact analysis and runbook update
On a webhook from your warehouse audit log, an agent investigates the changed column, traces which downstream models and dashboards depend on it.
Cross-warehouse replication schema mismatch reconciler
Compares the column shape of mirrored tables between BigQuery and Snowflake and, when a replicated table has drifted out of sync between the two, opens an Asana task for the data…
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
