DOCUMENT OPS

On-demand bulk backfill of a contract archive into the register

Triggered manually to sweep an existing archive of signed PDFs in cloud storage, extract terms from each, deduplicate against the register.

CategoryDocument Ops
Enginesim
Difficultyintermediate
Triggermanual
Steps5
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerOperator starts backfill manually
  • ActionList all PDFs in the Google Drive archive folderGoogle DriveGoogle Drive
  • ActionExtract terms from each PDF with OpenAIOpenAI
  • LogicDedup against existing register rows in Postgres
  • OutputBulk-insert missing contracts and report run summaryPostgreSQLPostgres

What it does

Backfills a register that started late. Pointed at a folder of historical executed agreements, it processes every PDF, extracts parties and dates, checks each against what is already in the register, and inserts only the ones that are missing so you get full history without duplicates.

When to use it

Use it once, or occasionally, when you stand up a contract register and need to load years of existing signed PDFs sitting in Google Drive. The ongoing intake workflows handle new files; this handles the back catalog.

How it works

  1. 1An operator starts the run manually.
  2. 2The workflow lists every PDF in the target Google Drive archive folder and iterates over them.
  3. 3Each file's text is extracted and OpenAI pulls counterparty, effective date, term, and renewal terms.
  4. 4A dedup check queries Postgres by counterparty plus effective date to skip contracts already registered.
  5. 5Missing contracts are bulk-inserted into the Postgres register, and a run summary reports how many were added, skipped, or failed to parse.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect Google DriveDocs, sheets, slides, files.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect PostgresAny Postgres URL — query, write, migrate.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.