DOCUMENT OPS
Backfill an archive of scanned POs into BigQuery line items
Runs on a schedule over a bucket of historical scanned PO PDFs, splits each batched file into individual POs, extracts line items with OpenAI.
How it runs
The automated pipeline, trigger to output.
- TriggerScheduled batch run
- ActionList unprocessed PO scans from S3 archiveAWS S3
- ActionSplit batches + extract line items (OpenAI)OpenAI
- LogicDedupe by PO number, tag fiscal period
- OutputAppend line rows to BigQuery tableBigQuery
What it does
Processes a backlog of archived purchase-order scans in batches. On each scheduled run it picks up unprocessed PDFs from cloud storage, splits multi-PO files, extracts line items, and appends them to a BigQuery line-item table so historical spend becomes queryable.
When to use it
Use it for a one-time or recurring backfill — turning years of scanned POs sitting in a bucket into structured analytics data, without hand-keying. The schedule keeps throughput steady and avoids timing out on the full archive at once.
How it works
- 1A scheduled trigger fires (for example hourly) to process the next batch.
- 2The flow lists unprocessed PO scans from the AWS S3 archive bucket.
- 3OpenAI splits each batched file into individual POs and extracts header and line fields.
- 4A logic step deduplicates by PO number against already-loaded records and tags each line with fiscal period.
- 5New line rows are appended to the BigQuery table and the source files are marked processed.
Set it up
What you configure once, before turning it on.
- 1Connect AWS S3Buckets, objects, signed URLs.
- 2Connect OpenAIModels, embeddings, files.
- 3Connect BigQueryDatasets, queries, schemas.
- 4Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 5Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 6Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Document Ops workflows
Narrate new Dropbox PDFs into MP3 audio versions
When a PDF lands in a watched Dropbox folder, extract its text and generate an ElevenLabs voice narration.
On-demand PDF narration via webhook with emailed audio link
Accepts a PDF URL through a webhook, generates an ElevenLabs narration with the requested voice, stores the MP3, and emails the requester a download link.
Triage emailed contract redlines and route by risk
When a counterparty emails a redlined contract, extracts the attachment, diffs clauses against approved templates.
Batch-narrate a Google Drive PDF folder in multiple languages
On a schedule, finds PDFs in a Google Drive folder that lack audio, then generates ElevenLabs narrations in each configured language and files them into per-language subfolders…
Executed Contract Exhibit & Initials Completeness Gate
When a signed contract lands in a Dropbox intake folder, verify every required exhibit, schedule, and initialed page is present.
Draft a negotiation brief from contract clause deviations
An agent reviews a contract against approved templates, researches each deviation.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
