In recent years, deep learning methods have shown impressive results for camera-based remote physiological signal estimation, clearly surpassing traditional methods. However, the performance and generalization ability of Deep Neural Networks heavily depends on rich training data truly representing different factors of variation encountered in real applications. Unfortunately, many current remote photoplethysmography (rPPG) datasets lack diversity, particularly in darker skin tones, leading to biased performance of existing rPPG approaches. To mitigate this bias, we introduce PhysFlow, a novel method for augmenting skin diversity in remote heart rate estimation using conditional normalizing flows. PhysFlow adopts end-to-end training optimization, enabling simultaneous training of supervised rPPG approaches on both original and generated data. Additionally, we condition our model using CIELAB color space skin features directly extracted from the facial videos without the need for skin-tone labels. We validate PhysFlow on publicly available datasets, UCLA-rPPG and MMPD, demonstrating reduced heart rate error, particularly in dark skin tones. Furthermore, we demonstrate its versatility and adaptability across different data-driven rPPG methods.
Most previous studies have used the Fitzpatrick scale to evaluate or categorize skin tone, dividing it into six levels, from I (lightest) to VI (darkest). In contrast, our skin tone transfer method employs a bi-dimensional representation in the CIELAB color space, which offers three key advantages. First, it eliminates the need for manual annotations, allowing it to be applied to unlabeled data. Second, it simplifies the collection and annotation process for new rPPG datasets. Finally, it accounts for variations in hue as well as lightness, providing a more nuanced representation of skin tone.
A 3D-CNN AE encodes entangled video facial content into a latent embedding. This embedding is then processed by c-CNFs to disentangle the skin tone content. Simultaneously, the rPPG model is iteratively trained using both original and skin tone-augmented data.
Our cross-dataset experiments on the MMPD dataset using three different data-driven models demonstrate the capability of PhysFlow for skin tone diversity augmenting in any supervised rPPG, showing how our approach significantly reduces heart rate estimation error, particularly in underrepresented skin tone categories, favoring equitable performance across different skin tones.
@article{comas2024physflow,
title={PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows},
author={Comas, Joaquim and Alomar, Antonia and Ruiz, Adria and Sukno, Federico},
journal={arXiv preprint arXiv:2407.21519},
year={2024}
}