DOCUMENT OPS

Nightly Dropbox PDF dedupe by content hash

Runs each night, hashes every PDF across the document folders, finds true duplicates (same content, different names), and moves the redundant copies to a quarantine folder…

CategoryDocument Ops
Enginesim
Difficultyintermediate
Triggerschedule
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerNightly schedule
  • ActionList PDFs and compute content hashesDropboxDropbox
  • LogicGroup by hash, keep oldest as canonicalPostgreSQLPostgres
  • ActionMove duplicate copies to quarantine folderDropboxDropbox
  • OutputPost dedupe summary to SlackSlack

What it does

Scans all PDFs under your Dropbox document tree on a schedule, computes a content hash for each, and detects files whose bytes are identical even when their names differ. It keeps the oldest copy as canonical and moves every duplicate into a `_duplicates/` quarantine folder, then posts a summary of what it reclaimed to Slack.

When to use it

Use it when the same document gets re-scanned, re-emailed, or re-uploaded under different names and your folders are bloating with copies. Quarantine (not delete) keeps it safe to run unattended.

How it works

  1. 1A nightly schedule starts the run.
  2. 2The flow lists every PDF in the watched folders and records each file's content hash in Postgres.
  3. 3A grouping step finds hashes with more than one file and picks the earliest-created file as canonical.
  4. 4Each non-canonical duplicate is moved to `_duplicates/` in Dropbox.
  5. 5A Slack message summarizes how many duplicates were quarantined and the space reclaimed.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DropboxFiles and folders.
  2. 2
    Connect PostgresAny Postgres URL — query, write, migrate.
  3. 3
    Connect SlackChannels, DMs, threads, mentions.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.