HuMoR: 3D Human Motion Model for Robust Pose Estimation
Supplementary Video Results


Paper

Project Page

This page contains extensive qualitative results that supplement experiments in Sec. 5 of the main paper and the appendix. To see a quick sampling of results instead, please visit the main project page. Each set of qualitative results can be accessed with the links below, or simply scroll down to watch the videos in order. Specific sequences are linked with buttons for easy reference.

  1. Generative Model Evaluation (Section 5.3)
  2. Estimation from 3D: Occluded Keypoints (Section 5.4)
  3. Estimation from 3D: Noisy Joints (Section 5.4)
  4. Estimation from RGB: i3DB Data Baseline Comparisons (Section 5.5)
  5. Estimation from RGB: i3DB Data Ablation Comparisons (Section 5.5)
  6. Estimation from RGB: PROX Data Baseline Comparisons (Section 5.5)
  7. Estimation from RGB-D: PROX Data Baseline Comparisons (Section 5.5)
  8. Estimation of Fast & Dynamic Motions (Appendix F.1)
  9. Failure Cases (Appendix A.1)

1. Generative Model Evaluation (Section 5.3)
These results show the capabilities of HuMoR as a standalone generative model. In each example, a random initial state is sampled from the AMASS test set, and then sequences are sampled through autoregressive rollout. NextReturn to top
2. Estimation from 3D: Occluded Keypoints (Section 5.4)
These results demonstrate using test-time optimization (TestOpt) with HuMoR as a motion prior to fit to partially observable 3D keypoint data (generated from the AMASS test set). In each example, Observations+Ground Truth shows the observed keypoints in blue on the ground truth body. Output motion is on the opaque body mesh along with observed keypoints again in blue for reference. NextReturn to top
3. Estimation from 3D: Noisy Joints (Section 5.4)
These results demonstrate fitting to noisy 3D joint data (generated from the AMASS test set). In each example, Observations+Ground Truth shows the observed joints in green with the ground truth body and motion. NextReturn to top
4. Estimation from RGB: i3DB Data Baseline Comparisons (Section 5.5)
These results demonstrate fitting to 2D joints detected from RGB videos in the i3DB dataset, which contains heavy occlusions. For each example sequence, the input video and output of TestOpt with HuMoR (both motion and contacts) are shown first. Next, HuMoR results are compared to the VIBE and VPoser-t baselines - first from the camera view and then from an alternate view with the predicted ground plane shown as reference. NextReturn to top
5. Estimation from RGB: i3DB Data Ablation Comparisons (Section 5.5)
Similarly, we compare results to ablations of the full HuMoR CVAE: No Delta (does not predict change in state) and Standard Prior (does not learn a conditional prior). These two are of particular interest since they are common in previous variational motion models. NextReturn to top
6. Estimation from RGB: PROX Data Baseline Comparisons (Section 5.5)
Similar to i3DB, we compare results for fitting to RGB observations in the PROX dataset. For each example sequence, just the output of our method is shown first, followed by the comparison to PROX-RGB and VPoser-t baselines. In all examples, PROX-RGB produces temporally incoherent results since it operates on single frames. However, it also uses the scene mesh as input which allows for plausible poses when the person is fully visible. This does not greatly improve results under occlusions, though, often reverting to a mean leg pose similar to VPoser-t and VIBE. NextReturn to top
7. Estimation from RGB-D: PROX Data Baseline Comparisons (Section 5.5)
Next, we show results for fitting to RGB-D observations from the PROX dataset. For each example sequence, just the output of our method is shown first. In these examples we show the predicted motion as well as the ground plane estimation (instead of contacts). The ground plane is rendered within the true scene mesh for reference only, our method does not use the scene mesh as input or output. Next, our results are compared to the PROX-D and VPoser-t baselines, first overlaid on the input video then within the ground truth scene geometry. NextReturn to top
8. Estimation of Fast & Dynamic Motions (Appendix F.1)
Most results so far have shown common motions (e.g. walking, sitting) in occluded settings. However, fitting with HuMoR can also capture fast and dynamic motions from full-body observations. In the following results, we show that despite not training on many dance motions, HuMoR effectively generalizes to complex dynamic movements and allows for large accelerations to accurately fit 3D keypoints and 2D joints captured from dancing motions. 3D keypoint data is from the DanceDB subset of AMASS (not used for training HuMoR) - the ground truth motion and shape along with observed keypoints are shown on the left, on the right is our fitting results alongside the ground truth keypoints. RBB videos are from the AIST dataset. NextReturn to top
9. Failure Cases (Appendix A.1)
Finally, we look at specific failure cases of TestOpt using HuMoR. Return to top