Agent Hive mark

Frequently asked questions

Do we need to throw out our existing Sinkhorn code?

No. If your problem fits in memory and runs in your batch window, leave it alone. The Riemannian low-rank method is the right call when you are hitting scale limits or spending engineer hours on tuning, not as a blanket replacement.

How do we pick the rank?

Start with the intrinsic dimensionality of your data. For embedding-based matching, ranks between 20 and 100 are common. Run a small ablation: solve at rank 10, 25, 50, 100 and measure the downstream business metric (conversion, fulfillment cost, retrieval precision). The plan quality usually plateaus before the cost does.

Is this production-ready or a research result?

The paper is recent and the reference implementations are research-grade. For production, expect to do the usual work: pin a version, write integration tests on your real data shapes, and benchmark against your current solver on a held-out batch before switching. The contribution is the algorithm; productionizing is on you.

What optimal transport actually buys you

Optimal transport (OT) is the cheapest way to move mass from one distribution to another, given a cost for moving each unit. In a business, "mass" might be inventory, ad budget, or attention; "cost" is whatever you are trying to minimize (miles, dollars, embedding distance).

The output is a transport plan: a table that says how much of source i goes to destination j. That plan is what your downstream system consumes.

Where teams use it today:

Recommendation and retrieval: matching user embeddings to item embeddings under a soft constraint that both sides stay balanced.
Domain adaptation: aligning a model trained on one customer's data to another customer's data without retraining from scratch.
Logistics and fulfillment: assigning orders to fulfillment centers when both supply and demand have shape, not just totals.
Model evaluation: comparing the distribution of model outputs to a reference distribution (a more honest signal than averaged scores).

The catch has always been cost. A dense OT plan between two sets of size n is an n by n matrix. Solving it with the standard Sinkhorn algorithm (the workhorse since 2013) is roughly O(n^2) per iteration. At a million rows, that is a trillion entries in memory before you start.

Mass moving between two distributions under an optimal transport plan

Why low-rank, and why the old fix was painful

Low-rank OT says: do not store the full plan. Assume it can be written as the product of two skinny matrices and a small middle factor. If the rank r is much smaller than n, you store and compute on O(nr) numbers instead of O(n^2). For a million rows at rank 50, that is roughly twenty thousand times less memory.

The trick is solving for the factors. Until now, the dominant approach was mirror descent: take a gradient step, project back onto the constraint set (rows and columns sum to the right totals), repeat. It works, but:

It is first-order, meaning it ignores the curvature of the problem. On a curved surface, walking in a straight line takes you off the surface; you spend iterations correcting course.
It needs a step size schedule that you tune by hand. Too aggressive and it diverges; too cautious and your batch job takes overnight.
Convergence is sensitive to how you initialize the factors. Two runs on the same data can take noticeably different wall-clock times.

If you have ever had a data scientist tell you "the OT step is unstable, we are tuning it again," this is what they meant.

The Riemannian reframing in plain terms

The new approach treats the set of valid low-rank transport plans as a curved surface (a Riemannian manifold) and runs the optimizer directly on that surface. The optimizer follows the curvature instead of fighting it.

Three things matter for an operator:

Steps stay on the constraint set by construction. You do not need a separate projection step that can fail or stall.
The method is second-order aware. It uses information about how the landscape bends, so each step is more informative.
Hyperparameters shrink. There are fewer knobs that meaningfully change the answer, which means less time tuning and fewer surprises in production.

flowchart LR
 A[Raw cost matrix C] --> B[Pick rank r]
 B --> C[Initialize factors Q, R, g]
 C --> D{Optimizer}
 D -->|Mirror descent| E[Project to constraints]
 E --> D
 D -->|Riemannian| F[Step along manifold]
 F --> D
 D --> G[Low-rank plan P approx Q diag 1/g R^T]
 G --> H[Downstream: matching, routing, ranking]

The diagram is the whole picture. The left branch (mirror descent) is the status quo: step, project, step, project. The right branch (Riemannian) is the new path: each step is already valid, so the inner loop is tighter.

What the numbers look like

The paper reports faster convergence and less sensitivity to initialization across synthetic and real datasets at ranks typical for production (10 to 100). The operator-relevant comparison:

Method	Per-iteration cost	Hyperparameters to tune	Sensitive to init?	Memory at n=1M, r=50
Sinkhorn (full rank)	O(n^2)	1 (regularization)	Low	~8 TB
LOT, mirror descent	O(nr)	3 to 4 (step sizes, regularization, schedule)	High	~400 MB
LOT, Riemannian	O(nr)	1 to 2	Low	~400 MB

LOT here means low-rank OT. Memory numbers are illustrative for a single-precision dense plan; your actual mileage depends on sparsity and whether you materialize the plan at all.

The headline is not "ten times faster." It is: the same per-iteration cost as the previous low-rank method, but with fewer tuning runs to get there and more predictable wall-clock once you do. For a batch job that ran weekly and required a half day of engineer babysitting, that is the difference between scheduled and unscheduled work.

A minimal code example

Here is what calling this kind of solver looks like in a Python pipeline. The point is to show how few knobs you set, not to be a benchmark.

# Solve a low-rank OT plan between two embedding sets.
# X: source embeddings, Y: target embeddings, a, b: marginals.
import numpy as np
from ott.geometry import pointcloud
from ott.problems.linear import linear_problem
from ott.solvers.linear import lr_sinkhorn # mirror-descent baseline
 
geom = pointcloud.PointCloud(X, Y) # cost is squared Euclidean by default
prob = linear_problem.LinearProblem(geom, a=a, b=b)
 
# Baseline: low-rank Sinkhorn (mirror descent under the hood).
solver = lr_sinkhorn.LRSinkhorn(rank=50, gamma=10.0, epsilon=0.0)
out = solver(prob)
# out.matrix is implicit: out.q, out.r, out.g are the low-rank factors.

That is the established path. The Riemannian variant from the paper has the same input contract: cost geometry plus marginals plus a rank. The change is the optimizer underneath, and the practical effect is that gamma (the step size knob) goes away or gets much less sensitive.

Wiring it into a batch job looks the same on either side:

# Nightly job that recomputes a user-to-offer matching plan.
python run_match.py \
 --source s3://prod/embeddings/users/2026-06-10.parquet \
 --target s3://prod/embeddings/offers/2026-06-10.parquet \
 --rank 50 \
 --solver riemannian-lr-ot \
 --out s3://prod/plans/2026-06-10.parquet

The flag --solver is what changes. Everything upstream (your embedding pipeline) and downstream (whatever consumes the plan) is untouched.

Pipeline diagram: embeddings to OT solver to downstream matching system

When to use which solver

This is the section to share with your team lead. The decision is rarely "which is best in theory." It is "what fits the shape of our workload."

Use full-rank Sinkhorn when

Your problem size is small (under roughly 50,000 on a side).
You care about an exact-ish plan and have GPU memory to spare.
You already have it in production and it is not the bottleneck.

Use low-rank OT with mirror descent when

You already use it and have stable hyperparameters.
You need backward compatibility with an existing tuned pipeline.

Use low-rank OT with the Riemannian solver when

You are starting fresh on a large-scale matching problem.
Your team has been spending engineer time tuning step sizes.
You need predictable batch runtimes for an SLA-bound job.
You want fewer surprises when the data distribution shifts week to week (because the optimizer is less sensitive to init).

Where this fits in an agent operating model

If you are building an agent that does matching as part of its decision loop (a routing agent, a procurement agent, a customer-segmentation agent), OT is often the quiet primitive underneath. The agent picks the inputs (which embeddings, which cost function, which constraints) and the solver returns the plan.

What an eval-driven operation cares about:

Determinism: can I rerun this and get the same plan? Lower hyperparameter sensitivity helps.
Auditability: can I explain why source i was matched to destination j? The low-rank factors are interpretable as soft clusters, which makes this easier than a dense plan.
Cost ceiling: can I bound how much this step costs in dollars per run? O(nr) with a known iteration count gives you a real number to budget against.

The Riemannian method does not change what OT is for. It changes how often you have to think about it. For an operator, that is the whole point of bringing in a better primitive: it stops being a thing on the standup agenda.

Riemannian Low-Rank Optimal Transport: Practical Guide