
A new Riemannian solver for low-rank optimal transport cuts tuning and stabilizes convergence. Here is what changes for matching, logistics, and retrieval…
No. If your problem fits in memory and runs in your batch window, leave it alone. The Riemannian low-rank method is the right call when you are hitting scale limits or spending engineer hours on tuning, not as a blanket replacement.
Start with the intrinsic dimensionality of your data. For embedding-based matching, ranks between 20 and 100 are common. Run a small ablation: solve at rank 10, 25, 50, 100 and measure the downstream business metric (conversion, fulfillment cost, retrieval precision). The plan quality usually plateaus before the cost does.
The paper is recent and the reference implementations are research-grade. For production, expect to do the usual work: pin a version, write integration tests on your real data shapes, and benchmark against your current solver on a held-out batch before switching. The contribution is the algorithm; productionizing is on you.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
Two things. First, if your cost matrix has structure the rank cannot capture, the low-rank plan will be a worse approximation regardless of solver. Check this by measuring the OT cost gap against full-rank Sinkhorn on a subsample. Second, manifold-based optimizers can still get stuck at saddle points; the paper discusses this, and the practical mitigation is the same as elsewhere: a few restarts from different inits.
The plan format is the same: a low-rank factorization that you can store, version, and replay. If anything, the lower hyperparameter sensitivity makes audits easier, because "we reran it and got a different answer" becomes a rarer conversation.
Optimal transport is the math behind a lot of business problems that look like matching: which warehouse ships to which store, which ad goes to which user, which embedding in your vector database is closest to a query. The bill for solving it grows with the square of the dataset, so by the time you have a million rows on each side, you are paying for it in cluster time and engineer attention. A recent paper proposes a Riemannian (curvature-aware) approach to the low-rank version of this problem, and the practical story is straightforward: fewer knobs, more stable runs, less tuning.
This post translates the paper into operator stakes. If you run a team that uses OT in recommendation, logistics, model alignment, or document retrieval, here is what changes and what does not.
Optimal transport (OT) is the cheapest way to move mass from one distribution to another, given a cost for moving each unit. In a business, "mass" might be inventory, ad budget, or attention; "cost" is whatever you are trying to minimize (miles, dollars, embedding distance).
The output is a transport plan: a table that says how much of source i goes to destination j. That plan is what your downstream system consumes.
Where teams use it today:
The catch has always been cost. A dense OT plan between two sets of size n is an n by n matrix. Solving it with the standard Sinkhorn algorithm (the workhorse since 2013) is roughly O(n^2) per iteration. At a million rows, that is a trillion entries in memory before you start.

Low-rank OT says: do not store the full plan. Assume it can be written as the product of two skinny matrices and a small middle factor. If the rank r is much smaller than n, you store and compute on O(nr) numbers instead of O(n^2). For a million rows at rank 50, that is roughly twenty thousand times less memory.
The trick is solving for the factors. Until now, the dominant approach was mirror descent: take a gradient step, project back onto the constraint set (rows and columns sum to the right totals), repeat. It works, but:
If you have ever had a data scientist tell you "the OT step is unstable, we are tuning it again," this is what they meant.
The new approach treats the set of valid low-rank transport plans as a curved surface (a Riemannian manifold) and runs the optimizer directly on that surface. The optimizer follows the curvature instead of fighting it.
Three things matter for an operator:
flowchart LR
A[Raw cost matrix C] --> B[Pick rank r]
B --> C[Initialize factors Q, R, g]
C --> D{Optimizer}
D -->|Mirror descent| E[Project to constraints]
E --> D
D -->|Riemannian| F[Step along manifold]
F --> D
D --> G[Low-rank plan P approx Q diag 1/g R^T]
G --> H[Downstream: matching, routing, ranking]The diagram is the whole picture. The left branch (mirror descent) is the status quo: step, project, step, project. The right branch (Riemannian) is the new path: each step is already valid, so the inner loop is tighter.
The paper reports faster convergence and less sensitivity to initialization across synthetic and real datasets at ranks typical for production (10 to 100). The operator-relevant comparison:
| Method | Per-iteration cost | Hyperparameters to tune | Sensitive to init? | Memory at n=1M, r=50 |
|---|---|---|---|---|
| Sinkhorn (full rank) | O(n^2) | 1 (regularization) | Low | ~8 TB |
| LOT, mirror descent | O(nr) | 3 to 4 (step sizes, regularization, schedule) | High | ~400 MB |
| LOT, Riemannian | O(nr) | 1 to 2 | Low | ~400 MB |
LOT here means low-rank OT. Memory numbers are illustrative for a single-precision dense plan; your actual mileage depends on sparsity and whether you materialize the plan at all.
The headline is not "ten times faster." It is: the same per-iteration cost as the previous low-rank method, but with fewer tuning runs to get there and more predictable wall-clock once you do. For a batch job that ran weekly and required a half day of engineer babysitting, that is the difference between scheduled and unscheduled work.
Here is what calling this kind of solver looks like in a Python pipeline. The point is to show how few knobs you set, not to be a benchmark.
# Solve a low-rank OT plan between two embedding sets.
# X: source embeddings, Y: target embeddings, a, b: marginals.
import numpy as np
from ott.geometry import pointcloud
from ott.problems.linear import linear_problem
from ott.solvers.linear import lr_sinkhorn # mirror-descent baseline
geom = pointcloud.PointCloud(X, Y) # cost is squared Euclidean by default
prob = linear_problem.LinearProblem(geom, a=a, b=b)
# Baseline: low-rank Sinkhorn (mirror descent under the hood).
solver = lr_sinkhorn.LRSinkhorn(rank=50, gamma=10.0, epsilon=0.0)
out = solver(prob)
# out.matrix is implicit: out.q, out.r, out.g are the low-rank factors.That is the established path. The Riemannian variant from the paper has the same input contract: cost geometry plus marginals plus a rank. The change is the optimizer underneath, and the practical effect is that gamma (the step size knob) goes away or gets much less sensitive.
Wiring it into a batch job looks the same on either side:
# Nightly job that recomputes a user-to-offer matching plan.
python run_match.py \
--source s3://prod/embeddings/users/2026-06-10.parquet \
--target s3://prod/embeddings/offers/2026-06-10.parquet \
--rank 50 \
--solver riemannian-lr-ot \
--out s3://prod/plans/2026-06-10.parquetThe flag --solver is what changes. Everything upstream (your embedding pipeline) and downstream (whatever consumes the plan) is untouched.

This is the section to share with your team lead. The decision is rarely "which is best in theory." It is "what fits the shape of our workload."
If you are building an agent that does matching as part of its decision loop (a routing agent, a procurement agent, a customer-segmentation agent), OT is often the quiet primitive underneath. The agent picks the inputs (which embeddings, which cost function, which constraints) and the solver returns the plan.
What an eval-driven operation cares about:
i was matched to destination j? The low-rank factors are interpretable as soft clusters, which makes this easier than a dense plan.O(nr) with a known iteration count gives you a real number to budget against.The Riemannian method does not change what OT is for. It changes how often you have to think about it. For an operator, that is the whole point of bringing in a better primitive: it stops being a thing on the standup agenda.