Abstract
We introduce a novel human pose estimation benchmark with millimeter wave radar
that includes synchronized vision and radio signal components.
This dataset is created using cross-calibrated mmWave radar sensors and a monocular RGB camera for cross-modality training of radar-based human pose estimation.
There are two advantages of using mmWave radar to perform human pose estimation.
First, it is robust to dark and low-light conditions.
Second, it is not visually perceivable by humans and therefore, can be widely applied to applications with privacy concerns.
In addition to the benchmark, we propose a cross-modality training framework that leverages
the ground-truth keypoints representing human body joints for training, which are systematically
generated from the pre-trained pose estimation network based on a monocular camera input image, avoiding laborious manual annotation efforts.