MARKET RESEARCH

New Dataset License Gatekeeper Alert

Watches for newly published datasets in your vertical and routes them by license: commercially-usable ones trigger a Linear task for evaluation.

CategoryMarket Research
Enginesim
Difficultyintermediate
Triggerschedule
Steps6
Setup~15 min

How it runs

The automated pipeline, trigger to output.

  • TriggerDaily schedule
  • ActionQuery Hugging Face for new datasetsHugging FaceHugging Face
  • ActionClassify each dataset's licenseOpenAI
  • LogicBranch by license category
  • ActionCreate Linear task for commercial-OK datasetsLinearLinear
  • OutputLog restricted datasets to Notion registerNotionNotion

What it does

Adds a compliance lens to dataset discovery. As new datasets appear in your vertical, it inspects each one's license and splits the stream: anything you can legally use in a product flows into Linear as an evaluation task, while restricted or unclear-license datasets are recorded in a Notion register so nobody wastes time on data you can't ship.

When to use it

For teams shipping commercial models where dataset licensing is a real legal constraint. It keeps your backlog full of usable candidates and your engineers from accidentally training on data that can't go to production.

How it works

  1. 1A daily cron starts the scan.
  2. 2Hugging Face is queried for datasets newly published in the vertical.
  3. 3A license-classification step labels each as commercial-OK, restricted, or unknown.
  4. 4A branch routes the dataset by label.
  5. 5Commercial-OK datasets create a Linear evaluation task with metadata.
  6. 6Restricted and unknown datasets are appended to a Notion off-limits register with the blocking reason.

Set it up

What you configure once, before turning it on.

  1. 1
    Connect Hugging FaceModels, datasets, spaces — the open-source hub.
  2. 2
    Connect OpenAIModels, embeddings, files.
  3. 3
    Connect LinearIssues, projects, cycles, triage.
  4. 4
    Connect NotionPages, databases, comments.
  5. 5
    Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
  6. 6
    Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
  7. 7
    Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.

Run this workflow in your colony.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.