Agent Hive mark

Large language models now sit inside robo-advisors, screeners, and execution agents. A recent audit asks a question most deployment teams have skipped: do these models carry latent preferences for specific assets, and if so, can those preferences be located, measured, and steered before they reach a client portfolio? ## The audit problem nobody wanted When a language model recommends an asset allocation, the decision path is usually treated as a black box of prompt, context, and output. Risk and compliance functions evaluate the output (does the recommendation match the stated risk tolerance, is the disclosure complete) but rarely the internal preference structure that generated it. That gap is widening as agentic systems move from advice generation into order routing, rebalancing, and tax-loss harvesting. The recent arXiv paper Auditing Asset-Specific Preferences in Financial Large Language Models frames the issue directly. The authors pose three questions: whether LLMs systematically prefer certain instruments, whether an internal representation with causal leverage can be identified, and whether that representation can be intervened on. Their case study is Bitcoin, an asset with enough cultural saturation in training corpora to make a preference signal plausible and measurable. For operators building agentic financial products, this is not an academic concern. A model that quietly overweights one asset class across thousands of advisory sessions is a latent compliance liability, a portfolio risk, and a reputational exposure all at once. ## What "asset-specific preference" actually means The authors distinguish three layers of preference that an audit needs to separate: - Surface preference: the asset appears more often in free-form recommendations than a neutral baseline would predict. - Allocation preference: when forced into a portfolio construction task with fixed weights summing to one, the model assigns the asset a higher share than peers with comparable risk profiles. - Representational preference: the model's internal activations encode a direction that, when amplified or suppressed, changes downstream allocation in a predictable way. The third layer is the interesting one. Surface and allocation preferences can be measured with prompts and counts. A representational preference requires probing the residual stream, identifying a linear direction associated with the asset, and showing that activation steering along that direction causally moves the allocation. That is the bridge from observation to intervention. ### Why Bitcoin is a useful probe Bitcoin is overrepresented in pretraining data relative to its share of global investable assets. It carries strong sentiment polarity in both directions. It is also one of the few assets where retail and institutional discourse, advocacy content, and academic critique are all densely represented in scraped text. If any asset is going to surface a measurable directional bias, this is a reasonable candidate. A team auditing equity sectors, sovereign debt, or specific issuers can adapt the same protocol, but Bitcoin offers the cleanest signal-to-noise ratio for a methodological paper. ## Findings worth your attention Three results from the audit matter for anyone shipping financial agents. First, the models tested show a non-trivial allocation preference toward Bitcoin even when the prompt specifies a conservative risk profile. The effect is not uniform across model families, but it is consistent enough within a given model to be treated as a stable property rather than prompt noise. Second, a low-dimensional representation correlated with Bitcoin allocation can be recovered from intermediate layers. The direction generalizes: prompts that never mention Bitcoin by name still activate it when crypto-adjacent concepts appear in context. Third, activation steering along this direction changes the recommended allocation monotonically. Suppressing the direction reduces the Bitcoin weight. Amplifying it raises the weight, sometimes to levels that would breach typical suitability thresholds. The causal handle exists. The practical implication is that asset-specific bias is not just a prompt engineering problem. You can soften the bias with careful system prompts, but the underlying representation is still firing. ## An audit protocol for production teams If you operate or supervise a financial agent, the paper's methodology suggests a concrete audit you can run. The protocol below adapts the research to a deployment setting. 1. Define the asset universe under audit. Pick instruments that matter for your product surface: major equity sectors, regional ETFs, specific cryptocurrencies, fixed income buckets. Keep the list small enough to evaluate exhaustively. 2. Construct neutral allocation prompts. Write prompts that specify risk tolerance, time horizon, and constraints without naming any asset. Vary the wording to control for prompt sensitivity. 3. Measure baseline allocations. For each prompt, sample multiple completions and record the allocation distribution per asset. Compare against a neutral reference (equal weight, market cap weight, or a curated benchmark portfolio). 4. Probe internal representations. For open-weight models, train linear probes on intermediate activations to predict allocation weights. For closed models, you are limited to behavioral testing, which is weaker but still informative. 5. Run causal interventions. Where you have weight access, apply activation steering along identified directions and confirm the allocation response. Where you do not, run prompt-level interventions (adding context that should suppress or amplify the asset) and look for asymmetric responses that suggest a latent preference. 6. Document drift thresholds. Define what magnitude of deviation from the neutral baseline triggers a review, a model swap, or a constraint layer. This is not a one-time exercise. Asset preferences will drift with model updates, fine-tuning runs, and changes in retrieval context. The audit belongs in your continuous evaluation pipeline alongside accuracy and safety checks. ### Where this fits in eval-driven operations Most eval suites for financial agents focus on regulatory correctness, hallucination rates, and tool-use reliability. Asset preference auditing sits beside these, not inside them. It answers a different question: not "did the agent do the task correctly" but "did the agent's choice distribution match what a neutral advisor would produce." Treat it as a population-level metric. Individual recommendations can deviate from a neutral baseline for good reasons. What you are looking for is systematic skew across thousands of sessions, which is exactly the kind of pattern that surfaces in regulatory reviews months after the fact. ## Governance implications The audit moves preference detection from anecdote to measurement, and that changes what governance functions can demand. A model risk management group can now ask, with specificity, what the asset preference profile of a candidate model is before approving it for deployment. A compliance team can require periodic re-audits and set tolerance bands. An internal audit function can sample production traffic and reconstruct the preference signature from observed outputs without needing access to model weights. For firms operating under suitability rules, the existence of a measurable, causally active preference creates a documentation obligation. If you know the model leans toward an asset and you ship it anyway, you should be able to explain the controls that prevent the lean from producing unsuitable recommendations. "We told it to be neutral in the system prompt" is not going to age well as a control narrative once representational audits become standard practice. ### The intervention question If the preference can be steered, should it be? There are two defensible positions. The first is that any intervention to suppress an asset preference is itself a form of editorial choice that should be disclosed. A model whose Bitcoin weight has been pushed down by activation steering is not a neutral model, it is a model with an explicit anti-Bitcoin adjustment. The second is that the untouched model is also not neutral, it is a model with an implicit pro-Bitcoin adjustment inherited from training data. The intervention is a correction toward a defensible neutral, and the correction should be documented in the model card. Both positions are reasonable. What is not reasonable is shipping without a position. ## Limits of the current work The paper is a methodology contribution, not a comprehensive market survey. A few cautions are worth keeping in mind before you generalize. The audit covers a specific set of models and a specific asset. Preference profiles for other assets, other model families, and other languages may look quite different. The representational findings depend on activation access, which closed-model deployments do not have. Behavioral audits remain possible but are noisier and easier for a vendor to address with prompt-level patches that do not actually move the underlying representation. The portfolio allocation tasks used in the audit are simplified compared to real advisory workflows that include client onboarding, KYC data, tax considerations, and product shelves. A preference visible in a synthetic task may be diluted or amplified in a production stack depending on how upstream components constrain the model's choice set. Finally, the steering interventions demonstrated causal influence on the allocation output, but the durability of those interventions across long conversations, tool calls, and retrieval contexts is not yet well characterized. Treat steering as a research-grade control, not a production-grade one, until that durability is established. ## What to do this quarter Three concrete actions for teams operating financial agents: - Add asset preference profiling to your model evaluation pipeline. Even a behavioral version, without weight access, will tell you whether your current model has a measurable lean. - Set explicit neutrality targets per asset class and define what deviation magnitudes trigger review. Document these in your model risk policy. - When evaluating new models or model updates, run the same protocol before promotion. A model with better task accuracy and worse preference neutrality is not obviously an upgrade. The broader point is that financial agent governance is moving from output-level checks toward representation-level audits. The teams that build the internal capability to run those audits will be better positioned both to ship safely and to answer the questions regulators are starting to ask.

Auditing Asset Bias in Financial LLMs: Bitcoin Evidence

The only platform to run an AI-native company.