MARKET RESEARCH
New Dataset License Gatekeeper Alert
Watches for newly published datasets in your vertical and routes them by license: commercially-usable ones trigger a Linear task for evaluation.
How it runs
The automated pipeline, trigger to output.
- TriggerDaily schedule
- ActionQuery Hugging Face for new datasetsHugging Face
- ActionClassify each dataset's licenseOpenAI
- LogicBranch by license category
- ActionCreate Linear task for commercial-OK datasetsLinear
- OutputLog restricted datasets to Notion registerNotion
What it does
Adds a compliance lens to dataset discovery. As new datasets appear in your vertical, it inspects each one's license and splits the stream: anything you can legally use in a product flows into Linear as an evaluation task, while restricted or unclear-license datasets are recorded in a Notion register so nobody wastes time on data you can't ship.
When to use it
For teams shipping commercial models where dataset licensing is a real legal constraint. It keeps your backlog full of usable candidates and your engineers from accidentally training on data that can't go to production.
How it works
- 1A daily cron starts the scan.
- 2Hugging Face is queried for datasets newly published in the vertical.
- 3A license-classification step labels each as commercial-OK, restricted, or unknown.
- 4A branch routes the dataset by label.
- 5Commercial-OK datasets create a Linear evaluation task with metadata.
- 6Restricted and unknown datasets are appended to a Notion off-limits register with the blocking reason.
Set it up
What you configure once, before turning it on.
- 1Connect Hugging FaceModels, datasets, spaces — the open-source hub.
- 2Connect OpenAIModels, embeddings, files.
- 3Connect LinearIssues, projects, cycles, triage.
- 4Connect NotionPages, databases, comments.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Market Research workflows
Enrich Inbound Accounts with BigQuery Firmographics and Score Fit
When a new account row lands in Airtable, joins it against BigQuery public business datasets to attach firmographic attributes.
Blend BigQuery TAM with Live Competitor Signals into a Notion Brief
On demand, sizes a chosen segment from BigQuery public data, gathers current competitor signals via Brave Search, and synthesizes a one-page market brief into Notion.
Allocate Sales Territory TAM from BigQuery Geo Data to HubSpot
When triggered by a webhook, queries BigQuery public ZIP-level business data to compute TAM per sales territory.
Hiring Surge Detector with Slack Alert
Detects when a target account's open-role count jumps above its recent baseline and posts a ranked Slack alert to the GTM channel so reps can act on a company that is clearly…
Tech-Stack Shift Inference from Job Descriptions
Reads new job descriptions for target accounts, uses an LLM to extract named technologies and infer stack changes.
Weekly Hiring-Intel Briefing for GTM
An agent reviews the week's accumulated hiring signals across all target accounts, writes a narrative briefing that infers each account's likely initiatives.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
