MARKET RESEARCH

Hugging Face dataset license-gate and review router

Monitors new domain datasets on Hugging Face and routes each by license: permissive ones auto-log to a cleared Notion list.

CategoryMarket Research
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerSchedule fires
  • ActionList new domain datasetsHugging FaceHugging Face
  • ActionRead license metadata from cardHugging FaceHugging Face
  • LogicClassify license: permissive / restrictive / missing
  • OutputLog permissive datasets to Notion approved listNotionNotion
  • OutputOpen Linear review ticket for the restLinearLinear

What it does

Applies a compliance gate to incoming Hugging Face datasets before anyone in your org uses them. It inspects each new domain-matching dataset's license and splits the flow: commercially safe licenses land in an approved Notion list, while ambiguous, non-commercial, or unlicensed datasets become a Linear ticket so legal or a data lead can review before use.

When to use it

Use it when dataset licensing matters for shipping models commercially and you need a defensible audit trail showing which datasets were cleared versus held for review.

How it works

  1. 1A schedule triggers the run.
  2. 2The Hugging Face step lists new datasets matching your research domain.
  3. 3An action reads each card's license metadata.
  4. 4A logic branch classifies the license as permissive, restrictive, or missing.
  5. 5Permissive datasets output to an approved Notion list with the license tag.
  6. 6Restrictive or missing-license datasets output as a Linear review ticket with the card link and license details.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect Hugging FaceModels, datasets, spaces — the open-source hub.
  2. 2
    Connect NotionPages, databases, comments.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  5. 5
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  6. 6
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.