MARKET RESEARCH
Paper-to-Dataset Crosswalk Brief
Finds new arXiv-style papers in your vertical via Exa, then matches each to related datasets on Hugging Face and writes a Notion brief linking each paper to the data needed…
How it runs
The automated pipeline, trigger to output.
- TriggerWeekly schedule
- ActionFind recent papers in vertical via ExaExa
- ActionExtract task and data requirementsOpenAI
- ActionMatch each paper to Hugging Face datasetsHugging Face
- LogicKeep papers with a strong dataset match
- OutputWrite crosswalk brief to NotionNotion
What it does
Bridges the gap between what researchers are publishing and what data exists to act on it. It surfaces recent papers in your vertical using Exa's neural search, then for each notable paper searches Hugging Face for datasets that match the paper's domain and task, and assembles a Notion brief that crosswalks paper → candidate datasets → reproduction notes.
When to use it
When your team reads papers and immediately asks "can we try this?" This turns that instinct into a standing artifact: every new paper arrives pre-paired with the datasets you'd need to replicate or build on it.
How it works
- 1A weekly cron triggers the run.
- 2Exa retrieves recent high-signal papers matching the vertical's topic queries.
- 3An LLM extracts each paper's task, domain, and data requirements.
- 4Hugging Face is searched for datasets matching those requirements.
- 5A logic step keeps only papers with at least one strong dataset match.
- 6A structured crosswalk page is written to Notion, one row per paper with linked datasets and notes.
Set it up
What you configure once, before turning it on.
- 1Connect ExaNeural search across the web.
- 2Connect Hugging FaceModels, datasets, spaces — the open-source hub.
- 3Connect OpenAIModels, embeddings, files.
- 4Connect NotionPages, databases, comments.
- 5Set each agent's modelWe leave models unset so you pick the tier — fast + cheap, or top-quality.
- 6Tune it to your dataEdit the prompts, filters, and field mappings so it matches how your team works.
- 7Test, then turn it onRun once against a sample, confirm the output, then enable the trigger.
More Market Research workflows
Enrich Inbound Accounts with BigQuery Firmographics and Score Fit
When a new account row lands in Airtable, joins it against BigQuery public business datasets to attach firmographic attributes.
Blend BigQuery TAM with Live Competitor Signals into a Notion Brief
On demand, sizes a chosen segment from BigQuery public data, gathers current competitor signals via Brave Search, and synthesizes a one-page market brief into Notion.
Allocate Sales Territory TAM from BigQuery Geo Data to HubSpot
When triggered by a webhook, queries BigQuery public ZIP-level business data to compute TAM per sales territory.
Hiring Surge Detector with Slack Alert
Detects when a target account's open-role count jumps above its recent baseline and posts a ranked Slack alert to the GTM channel so reps can act on a company that is clearly…
Tech-Stack Shift Inference from Job Descriptions
Reads new job descriptions for target accounts, uses an LLM to extract named technologies and infer stack changes.
Weekly Hiring-Intel Briefing for GTM
An agent reviews the week's accumulated hiring signals across all target accounts, writes a narrative briefing that infers each account's likely initiatives.
Run it inside a business
This workflow drops into a full company template. Import the org, and this is one of the playbooks its agents run.

Run this workflow in your colony.
14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
