
How service robots estimate heart rate from a standard RGB camera, and what it takes to keep the signal stable under real-world lighting conditions.
Roughly 10 to 30 seconds of a reasonably stable view of a face. Shorter windows are noisier; longer windows assume the person sits still. Most deployments target a 15-second sliding window updated every 1 to 2 seconds.
Older rPPG methods were biased toward lighter skin because they tuned on lighter-skin datasets. Modern approaches that explicitly model the skin-tone axis, including the one in the recent paper, narrow the gap considerably. You still need to evaluate on a representative dataset for your population and report the per-group error, not just the average.
Yes. Respiration rate is straightforward from the same signal. Heart rate variability and blood oxygen saturation are active research areas with weaker reliability. Treat anything beyond pulse and breathing as exploratory.

14-day trial. No DevOps. No Sales call. Provisioned in under a minute.
If the robot already has an RGB camera and a CPU or small GPU, the marginal cost is engineering time: a few weeks for a competent team to integrate an open-source rPPG library, build an evaluation set, and wire it into the policy layer. The ongoing cost is the evaluation cadence, not compute.
Track mean absolute error in beats per minute against a reference (a wearable or chest strap on consenting subjects) on a recurring schedule. When error in any scene category drifts past your target for two consecutive weeks, that is your retrain signal. The point of the eval manifest above is to make that decision routine rather than dramatic.
Physiological awareness gives a service robot a reason to slow down, hand off to a human, or check in. A nursing robot that notices a resident's heart rate climbing during a transfer can pause. A retail greeter that detects a customer's distress can route them to staff. The sensing approach behind this, remote photoplethysmography (rPPG), reads tiny color changes in skin caused by blood flow, using nothing more than the camera the robot already carries. The open problem, and the subject of recent work on illumination-robust rPPG for robots, is making the signal survive real lighting.
When your heart beats, blood volume in the skin rises and falls. That changes how much red, green, and blue light the skin reflects. The change is too small for a person to see, but a camera running at 30 frames per second can pick it up across a few hundred pixels of cheek or forehead.
That is the entire physical basis. No infrared sensor, no contact pad, no wearable. The robot looks at a face for roughly 10 to 30 seconds and produces a heart rate in beats per minute.
For an operator, the business read is simple. If you already have a robot with a camera in front of customers, patients, or employees, you have a potential vital-sign sensor at zero marginal hardware cost. The cost is software, calibration, and governance.

The signal from a beating heart is on the order of 1% of the pixel value. Anything that also changes pixel values at a similar scale becomes noise. The big offenders:
A lab demo done under flat studio light will not survive a hospital corridor at 3pm in summer. This is the gap the recent research targets: keeping the heart-rate estimate stable when the light is uncontrolled.
flowchart LR
A[Camera frames] --> B[Face and skin detection]
B --> C[Region of interest tracking]
C --> D[Per-channel color signal]
D --> E[Illumination correction]
E --> F[Band-pass filter 0.7 to 4 Hz]
F --> G[Spectral peak picking]
G --> H[Heart rate in BPM]
E -.->|quality score| HThe two stages that decide whether the system works in production are illumination correction and the quality score. The quality score is what lets the robot say "I am not confident" instead of returning a wrong number with full confidence. That distinction is what separates a useful sensor from a liability.
The core idea in the arXiv paper is to separate the part of the color change that is caused by the heartbeat from the part caused by light changing on the skin. Light changes affect all color channels in a predictable, correlated way. The pulse affects them differently, with green carrying most of the signal because hemoglobin absorbs green light strongly.
By projecting the red, green, and blue traces into a space where light-driven motion lives on one axis and pulse-driven motion lives on another, you can throw away the lighting axis. Methods in this family include CHROM and POS (plane-orthogonal-to-skin); the recent work extends them with a per-frame estimate of how much the lighting itself is changing, then weights frames accordingly.
The practical effect: when a cloud passes the window, the system does not panic and emit a heart rate of 180. It either corrects the change out or marks those seconds as low quality.
Here is a compact Python sketch that captures the structure. It is not a deployment artifact, but it shows what each step does so a non-developer can have an informed conversation with their engineering team.
# Estimate heart rate from a short video of a face.
# Returns BPM and a confidence score between 0 and 1.
import numpy as np
from scipy.signal import butter, filtfilt, welch
def pos_signal(rgb):
# rgb: array of shape (frames, 3), mean color of skin region per frame
mean = rgb.mean(axis=0)
normalized = rgb / mean - 1
# Project onto the plane orthogonal to the skin-tone axis
proj = np.array([[0, 1, -1], [-2, 1, 1]]) @ normalized.T
alpha = proj[0].std() / (proj[1].std() +
The line that matters for operations is the last one. The function returns a confidence number alongside the heart rate. Your application logic should ignore any reading below a threshold rather than show it to a clinician or trigger an alert.
If you are deciding what to put on a robot or in a room, the relevant comparison is not "rPPG vs. nothing." It is rPPG against the other ways to read a pulse without a clinical chest strap.
| Approach | Hardware cost | Setup per person | Accuracy in good conditions | Failure mode |
|---|---|---|---|---|
| rPPG via existing RGB camera | None added | None | Within 2 to 5 BPM | Degrades with bad light, motion |
| Wrist wearable (consumer) | 100 to 300 USD per person | Pairing, charging | Within 2 to 4 BPM | Person forgets to wear it |
| Chest strap | 60 to 150 USD per person | Strap on, gel | Within 1 to 2 BPM | Refusal, discomfort |
| Thermal camera pulse | 1,500 to 5,000 USD per unit | None | Within 3 to 6 BPM | Cost, calibration |
| Radar pulse sensor | 200 to 800 USD per unit | None | Within 3 to 7 BPM | Multi-person ambiguity |
The decision usually comes down to two questions. Does every person you want to monitor agree to wear or touch something? And do you already have a camera looking at them? If the answers are no and yes, rPPG is the cheapest path to a heart-rate feed.
A robot that reports vital signs is not interesting by itself. It becomes interesting when the reading feeds a decision an agent can make: pause a task, escalate to a human, log an incident, adjust a script. The pattern looks like this.
+-----------------+ +------------------+ +---------------------+
| Robot camera | -> | rPPG estimator | -> | Physiology stream |
+-----------------+ +------------------+ +----------+----------+
|
v
+-------------------------------+-----------+
| Agent policy: thresholds, eligibility, |
| consent flags, escalation rules |
+-------------------------------+-----------+
|
+----------------------------------+---------------------+
v v v
+-------------+ +---------------+ +-------------+
| Human alert | | Task throttle | | Audit log |
+-------------+ +---------------+ +-------------+The point of the diagram: the model is the easy part. The hard part is the policy layer that decides what a reading means in your context. A heart rate of 110 BPM in a physical therapy session is expected. The same reading in a quiet waiting room is a signal. The agent needs context, not just numbers.
Treat the heart-rate sensor like any other production model: test it against a labeled dataset, track drift, and gate deployment on metrics. A simple evaluation manifest:
eval:
dataset: rppg_internal_v3
splits:
- name: indoor_office_flat_light
target_mae_bpm: 2.5
- name: window_side_afternoon
target_mae_bpm: 4.0
- name: corridor_mixed_led
target_mae_bpm: 5.0
- name: motion_talking
target_mae_bpm: 6.0
reject_below_confidence: 0.35
alerting:
page_on_regression_pct: 15
page_on_calibration_drift_pct: 10
cadence: weeklyTwo things to notice. First, the targets differ by scene. A single average number across all conditions hides the scenes where the sensor fails. Second, readings below a confidence threshold are rejected outright. A robot that says "I do not know" is more useful than one that guesses.

Where it earns its keep:
Where it does not belong:
This is the part that decides whether the project ships or stalls in legal review. Decide these before you write code, not after.