DOCUMENT OPS

Daily Dropbox form batch extraction to BigQuery

On a daily schedule, processes every new scanned form accumulated in a Dropbox folder, extracts structured fields, and appends the validated rows to a BigQuery table for analytics.

CategoryDocument Ops
Enginesim
Difficultyadvanced
Triggerschedule
Steps5
Setup~25 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule fires the batch run
  • ActionList and download new Dropbox forms since last runDropboxDropbox
  • ActionExtract fields from each form via Hugging FaceHugging FaceHugging Face
  • LogicValidate records against expected schema
  • OutputAppend valid rows to BigQuery intake tableGoogle BigQueryBigQuery

What it does

Once a day this workflow sweeps a Dropbox folder for forms received since the last run, extracts each one's fields, validates them against the expected schema, and loads the clean rows into a BigQuery table so the day's intake is queryable for reporting.

When to use it

Use it when you do not need per-document real-time processing but want a reliable nightly batch that turns a pile of scanned forms into analytics-ready rows in your warehouse.

How it works

  1. 1A daily schedule trigger fires the run.
  2. 2The workflow lists files added to the Dropbox folder since the previous run and downloads each one.
  3. 3A Hugging Face document model extracts the fields from every form.
  4. 4A logic step validates each record against the expected field set and types, separating valid rows from malformed ones.
  5. 5Valid rows are appended to the BigQuery intake table in a single load.
  6. 6The count of processed, valid, and skipped forms is logged as the run summary.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect DropboxFiles and folders.
  2. 2
    Connect Hugging FaceModels, datasets, spaces — the open-source hub.
  3. 3
    Connect BigQueryDatasets, queries, schemas.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.