Non-contact monitoring videos capture subtle respiratory-induced motions, yet existing methods primarily focus on estimating respiratory rate (RR), neglecting the extraction of respiratory waveforms—a vital parameter that provides critical health information. We formulate video-based RR estimation as a Tracking All Points (TAP) problem and propose a coarse-to-fine, multi-frame Persistent Independent Particle (RRPIPs) framework for robust, multi-modal (RGB, NIR, IR) RR waveform estimation. Addressing the challenge of tracking minute, non-rigid pixel displacements caused by respiratory motions, our top-down approach magnifies respiratory motion using phase-based video magnification tuned to the respiratory frequency range and employs a pretrained RAFT optical flow model for initial region identification via a two-frame analysis. Coarse-scale tracking is performed using the RRPIPs model, while a Signal Quality Index (SQI) block evaluates the SNR of trajectories to refine high-respiratory-activity regions. These regions are upsampled, and fine-scale tracking is applied to extract precise waveforms. We curated a large-scale multimodal dataset for respiratory point tracking, combining in-house collected data and public datasets, with dense annotations of non-rigid pixel movements across multiple scales in key respiratory regions. Experimental results demonstrate that our framework achieves state-of-the-art accuracy (∼1 MAE) and interpretability in respiratory waveform extraction across RGB, NIR, and IR modalities, effectively addressing multi-scale tracking and low-SNR challenges. Thorough ablation studies validate the contributions of each framework component, and we plan to open-source our codes and dataset to support further research.
We created a comprehensive dataset for respiratory waveform estimation, including dense annotations of pixel movements in respiratory areas, covering diverse demographics, patterns, and conditions. Annotations were generated using a semi-supervised, human-in-the-loop method with optical flow and point-tracking.
This multimodal dataset includes synchronized RGB, NIR, and IR videos with dense annotations of respiratory regions and non-rigid pixel movements across multiple scales. It enables robust development and evaluation of advanced multimodal respiratory tracking algorithms.
RRPIPs is built upon the PIPs++ framework, with adaptations for multimodal generalization (RGB, NIR, IR) and multi-scale tracking to handle coarse global shifts and fine local displacements. Fine-tuned on our curated multimodal respiratory dataset, RRPIPs accurately tracks single-pixel (coarse) and multi-pixel (fine) respiratory displacements continuously across frames, achieving state-of-the-art performance in real-world scenarios.
Our framework operates in three stages:
→
1 Respiratory region localization using motion magnification and optical flow
→
2 Quality respiratory motion localization using coarse-scale tracking and SQI analysis
→
3 Fine-scale point tracking on upscaled regions
@inproceedings{rrpips2025,
author = {Zahid Hasan and Masud Ahmed and Shadman Sakib and Snehalraj Chugh and Md Azim Khan and Abu Zaher MD Faridee and Nirmalya Roy},
title = {RRPIPs: Respiratory Waveform Reconstruction using Persistent Independent Particles tracking from video},
booktitle = {CHASE},
year = {2025},
}
Copyright © 2025. All rights reserved.
