DATA OPS

Validate S3 Partner CSV Feeds and Quarantine Bad Rows

Watches an S3 prefix for new partner CSV drops, validates every row against a contract schema, loads clean rows to Postgres.

CategoryData Ops
Enginesim
Difficultyintermediate
Triggerevent
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNew CSV lands in S3 incoming/ prefixAWS S3
  • LogicParse rows and validate each against contract schema
  • LogicSplit valid rows from invalid rows
  • ActionInsert valid rows into Postgres staging tablePostgreSQLPostgres
  • ActionWrite rejected rows + defect report to S3 quarantine/AWS S3
  • OutputPost run summary to SlackSlack

What it does

When a partner uploads a CSV to a watched S3 prefix, this workflow checks each row against your contract schema (required columns, types, value ranges) and splits the file in two: clean rows go into your warehouse table, and any row that fails a rule is set aside with the exact reason it failed.

When to use it

Use it when partners deliver flat-file feeds to an S3 landing bucket and you need a deterministic gate before the data touches your warehouse. It stops one malformed row from poisoning a batch load and gives partners a precise list of what to fix.

How it works

  1. 1A new object lands under the `incoming/` prefix in S3 and fires the trigger.
  2. 2The CSV is parsed and each row is checked against the schema contract — missing fields, wrong types, and out-of-range values are flagged.
  3. 3The flow branches valid rows from invalid rows.
  4. 4Valid rows are inserted into the Postgres staging table.
  5. 5Invalid rows plus a per-row defect column are written to the `quarantine/` S3 prefix as a defect report.
  6. 6A run summary (rows accepted, rows quarantined, top failure reasons) is posted to Slack.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect AWS S3Buckets, objects, signed URLs.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.