DOCUMENT OPS

Auto-redact PII in S3 documents before publishing to a public bucket

Scans documents uploaded to a staging S3 bucket, generates a redacted copy with sensitive spans masked, and promotes the clean version to the public bucket automatically.

CategoryDocument Ops
Enginesim
Difficultyadvanced
Triggerevent
Steps6
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNew object in staging S3 bucketAWS S3
  • ActionFetch object and extract textAWS S3
  • ActionDetect sensitive spans to maskOpenAI
  • LogicConfirm redacted output has no residual findings
  • ActionWrite redacted copy to public S3 bucketAWS S3
  • OutputPost published-artifact summary to SlackSlack

What it does

Takes documents from a private staging S3 bucket, detects every PII and secret span, produces a redacted copy with those spans masked, and publishes the redacted version to a public-facing bucket so the original sensitive file never leaves staging.

When to use it

Use it when you publish reports, datasets, or transcripts that may contain personal data and you want machine-applied redaction instead of manual blacking-out. Suited to data-ops teams shipping public artifacts on a schedule or per upload.

How it works

  1. 1A new object in the staging S3 bucket fires the trigger.
  2. 2The object is fetched and its text extracted.
  3. 3An OpenAI pass returns the exact sensitive spans to mask, by category and character offset.
  4. 4A redacted rendering is built by replacing each span with a category tag like [REDACTED:SSN].
  5. 5A logic check confirms zero residual high-confidence findings remain in the redacted output.
  6. 6The clean redacted file is written to the public S3 bucket and a Slack note links the published artifact and a redaction summary.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect AWS S3Buckets, objects, signed URLs.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.