DOCUMENT OPS

Backfill an archive of scanned POs into BigQuery line items

Runs on a schedule over a bucket of historical scanned PO PDFs, splits each batched file into individual POs, extracts line items with OpenAI.

CategoryDocument Ops
Enginesim
Difficultyadvanced
Triggerschedule
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerScheduled batch run
  • ActionList unprocessed PO scans from S3 archiveAWS S3
  • ActionSplit batches + extract line items (OpenAI)OpenAI
  • LogicDedupe by PO number, tag fiscal period
  • OutputAppend line rows to BigQuery tableGoogle BigQueryBigQuery

What it does

Processes a backlog of archived purchase-order scans in batches. On each scheduled run it picks up unprocessed PDFs from cloud storage, splits multi-PO files, extracts line items, and appends them to a BigQuery line-item table so historical spend becomes queryable.

When to use it

Use it for a one-time or recurring backfill — turning years of scanned POs sitting in a bucket into structured analytics data, without hand-keying. The schedule keeps throughput steady and avoids timing out on the full archive at once.

How it works

  1. 1A scheduled trigger fires (for example hourly) to process the next batch.
  2. 2The flow lists unprocessed PO scans from the AWS S3 archive bucket.
  3. 3OpenAI splits each batched file into individual POs and extracts header and line fields.
  4. 4A logic step deduplicates by PO number against already-loaded records and tags each line with fiscal period.
  5. 5New line rows are appended to the BigQuery table and the source files are marked processed.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect AWS S3Buckets, objects, signed URLs.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect BigQueryDatasets, queries, schemas.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.