ExtPose: Robust and Coherent Pose Estimation by Extending ViTs | | 67.5 | 34.0 | 54.2 | | | | ExtPose (-a, T=16) | 2025-06-18 |
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion | ✓ Link | 68.7 | 35.9 | 57.8 | | | | WHAM (ViT) | 2023-12-12 |
Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery | | 74.4 | 40.6 | 62.7 | 7.7 | | | TAR (N=9) | 2023-11-16 |
Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction | ✓ Link | 76.3 | 39.8 | 65 | | | | Zolly (HRNet-w48) | 2023-03-24 |
Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction | ✓ Link | 76.7 | 39.9 | 64.7 | | | | CycleAdapt (w/ 2D GT) | 2023-08-12 |
PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery | | 76.8 | 39.8 | 67.7 | | | | PostoMETRO (HRNet-w48) | 2024-03-19 |
3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models | | 76.8 | 41.9 | 65.2 | | | | CLIFF (3DPW+HumanWild+BEDLAM+AGORA) | 2024-03-17 |
Hulk: A Universal Knowledge Translator for Human-Centric Tasks | ✓ Link | 77.4 | 38.5 | 66.3 | | | | Hulk(ViT-L) | 2023-12-04 |
GenHMR: Generative Human Mesh Recovery | | 77.5 | 42.1 | 68.1 | | | | GenHMR | 2024-12-19 |
3D Human Mesh Estimation from Virtual Markers | ✓ Link | 77.9 | 41.3 | 67.5 | | | | VirtualMarker | 2023-03-21 |
PostoMETRO: Pose Token Enhanced Mesh Transformer for Robust 3D Human Mesh Recovery | | 78.0 | 40.8 | 68.4 | | | | PostoMETRO (ResNet-50) | 2024-03-19 |
SEFD: Learning to Distill Complex Pose and Occlusion | ✓ Link | 78.36 | 43.79 | 64.75 | | | | SEFD_GT | 2023-01-01 |
Learning Unorthogonalized Matrices for Rotation Estimation | | 79.2 | 42.0 | 67.6 | | | | PROM (CLIFF) | 2023-12-01 |
MotionBERT: A Unified Perspective on Learning Human Motion Representations | ✓ Link | 79.4 | 40.6 | 68.8 | | | | MotionBERT-HybrIK | 2022-10-12 |
MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction | ✓ Link | 79.4 | 42.8 | 65.9 | | | | MPT | 2022-11-24 |
BioPose: Biomechanically-accurate 3D Pose Estimation from Monocular Videos | | 79.8 | 39.5 | 69.0 | | | | BioPose | 2025-01-14 |
Hulk: A Universal Knowledge Translator for Human-Centric Tasks | ✓ Link | 79.8 | 39.9 | 67 | | | | Hulk(ViT-B) | 2023-12-04 |
BoPR: Body-aware Part Regressor for Human Shape and Pose Estimation | ✓ Link | 80.8 | 42.5 | 65.4 | | | | BoPR (HR-W48) | 2023-03-21 |
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | ✓ Link | 81.4 | 46.5 | 67.7 | 6.5 | | | ARTS (Resnet50 L=16) | 2024-10-21 |
RemoCap: Disentangled Representation Learning for Motion Capture | | 81.9 | 44.1 | 72.7 | | | | RemoCap | 2024-05-21 |
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation | ✓ Link | 82 | 40.4 | 65.5 | | | | DynaBOA (w/ 2D GT) | 2021-11-07 |
SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | ✓ Link | 82 | 43.4 | 73.7 | | | | SMPLer-L | 2024-04-23 |
Humans in 4D: Reconstructing and Tracking Humans with Transformers | ✓ Link | 82.2 | 44.4 | 69.8 | | | | HMR 2.0 | 2023-05-31 |
HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body Mesh Recovery | ✓ Link | 82.3 | 41.8 | 71.6 | | | | HybrIK (HRNet-W48) | 2023-04-12 |
Deformable Mesh Transformer for 3D Human Mesh Recovery | ✓ Link | 82.6 | 44.3 | 72.9 | | | | DeFormer | 2023-01-01 |
3D Human Pose and Shape Estimation via HybrIK-Transformer | ✓ Link | 83.6 | 42.3 | 71.6 | | | | HybrIK-Transformer (HrNet-48) | 2023-02-09 |
PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers | | 84 | 43.5 | 73.1 | | | | PSVT | 2023-03-16 |
IKOL: Inverse kinematics optimization layer for 3D human pose and shape estimation via Gauss-Newton differentiation | ✓ Link | 84.1 | 44.5 | 71.1 | | | | IKOL | 2023-02-02 |
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation | ✓ Link | 84.6 | 44.3 | 71 | | | | TokenHMR (SD + ITW + BL) | 2024-04-25 |
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation | ✓ Link | 86.5 | 45.0 | 74.1 | | | | HybrIK | 2020-11-30 |
NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation | ✓ Link | 86.6 | 40.6 | 71.3 | | | | NIKI (Twist-and-Swing) | 2023-05-15 |
FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER | ✓ Link | 86.9 | 45.9 | 73.4 | | | | HeatER | 2022-05-30 |
Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view | | 87.1 | 45.4 | 74.3 | | | | ImpHMR | 2023-06-30 |
XFormer: Fast and Accurate Monocular 3D Body Capture | | 87.1 | 45.7 | 75 | | | | XFormer (HRNet) | 2023-05-18 |
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery | ✓ Link | 87.4 | 44.8 | 75 | | | | POTTER | 2023-03-23 |
Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens | ✓ Link | 87.9 | 42 | 75.6 | 16.5 | | | INT-2 (ResNet-50) | 2023-03-01 |
MotionBERT: A Unified Perspective on Learning Human Motion Representations | ✓ Link | 88.1 | 47.2 | 76.9 | | | | MotionBERT (Finetune) | 2022-10-12 |
Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction | ✓ Link | 91.2 | 49.5 | 77.2 | | | | BOA (w/ 2D GT) | 2021-03-30 |
SEFD: Learning to Distill Complex Pose and Occlusion | ✓ Link | 92.60 | 49.39 | 77.37 | | | | SEFD | 2023-01-01 |
Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation | | 93.5 | 50.3 | 76.7 | 11 | | | DST-VIBE | 2021-10-22 |
Learning Local Recurrent Models for Human Mesh Recovery | | 93.6 | 51.2 | 81.7 | 15.6 | | | LMR | 2021-07-27 |
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation | ✓ Link | 96.3 | 50.6 | 80.7 | 6.6 | | | GLoT | 2023-03-26 |
KAMA: 3D Keypoint Aware Body Mesh Articulation | | 97.0 | 51.1 | | | | | KAMA | 2021-04-27 |
TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments | ✓ Link | 97.3 | 37.8 | 79.1 | | | | TRACE | 2023-06-05 |
TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation | ✓ Link | 98.1 | 51.5 | 79.9 | 8.9 | | | TAPE (T=16) | 2023-04-29 |
VIBE: Video Inference for Human Body Pose and Shape Estimation | ✓ Link | 99.1 | 51.9 | 82.9 | 23.4 | | 72.43 | VIBE | 2019-12-11 |
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video | | 99.7 | 52.1 | 84.3 | 7.4 | 4.45 | 39.63 | MPS-Net (T=16) | 2022-03-16 |
Cyclic Test-Time Adaptation on Monocular Video for 3D Human Mesh Reconstruction | ✓ Link | 99.9 | 51.1 | 84.4 | | | | CycleAdapt (w/o 2D GT) | 2023-08-12 |
Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation | ✓ Link | 100.3 | 52.3 | 84.6 | 11.4 | | | TePose (T=6) | 2022-07-25 |
Self-Attentive 3D Human Pose and Shape Estimation from Videos | | 100.6 | 50.4 | 85.8 | 77.9 | | | Self-Attentive | 2021-03-26 |
Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos | | 101.2 | 52.4 | 85.2 | 6.9 | | | STR | 2022-10-07 |
DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos | | 101.2 | 53.3 | 85.9 | 6.6 | | | DDT | 2023-03-23 |
3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models | | 102.1 | 52.7 | 87.3 | | | | CLIFF | 2024-03-17 |
Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video | ✓ Link | 102.9 | 52.7 | 86.5 | 7.1 | | | TCMR (T=16 w/o H3.6M) | 2020-11-17 |
Leveraging MoCap Data for Human Mesh Recovery | ✓ Link | 103.8 | 52.9 | 89.4 | 8.3 | | | MoCap-SPIN + PoseBERT | 2021-10-18 |
A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose | ✓ Link | 106.2 | 58.9 | 88.5 | | | | GTRS | 2021-11-24 |
MUG: Multi-human Graph Network for 3D Mesh Reconstruction from 2D Pose | | 106.2 | 60.5 | 87 | | | | MUG | 2022-05-25 |
Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes | ✓ Link | 108.5 | 55.8 | 85.8 | | | | 3DCrowdNet | 2021-04-15 |
PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos | | 108.6 | 66.9 | 87.8 | | | | PC-HMR | 2021-03-16 |
Occluded Human Body Capture with Self-Supervised Spatial-Temporal Motion Prior | ✓ Link | 110.1 | 51.7 | 83.7 | | | | CHOMP | 2022-07-12 |
Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video | ✓ Link | 111.5 | 55.8 | 95 | 7 | | | TCMR (T=16 w/o 3DPW) | 2020-11-17 |
Decomposed Human Motion Prior for Video Pose Estimation via Adversarial Training | | 112.6 | 51.4 | 89.4 | 17.1 | | | Wenshuo et a;. | 2023-05-30 |
Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop | ✓ Link | 116.4 | 59.2 | 96.9 | | | | SPIN | 2019-09-27 |
Body Meshes as Points | ✓ Link | 119.3 | 63.8 | 104.1 | | | | BMP | 2021-05-06 |
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation | ✓ Link | | 40.3 | 69.7 | | | | ZeDO (S=1,J=17) | 2023-07-07 |
3DHR-Co: A Collaborative Test-time Refinement Framework for In-the-Wild 3D Human-Body Reconstruction Task | | | 42.11 | 63.72 | | | | 3DHR-Co (w/ 2D GT) | 2023-10-02 |
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation | ✓ Link | | 42.6 | 80.9 | | | | ZeDO (Cross Dataset) | 2023-07-07 |
One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer | ✓ Link | | 45.1 | 74.7 | | | | OSX | 2023-03-28 |
IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation | | | 46 | | | | | IVT (f=5) | 2022-08-06 |
DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation | ✓ Link | | 46.4 | 75.5 | | | | DeciWatch-PARE | 2022-03-16 |
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer | ✓ Link | | 49.3 | 80.6 | | | | TCFormer | 2022-04-19 |
MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation | ✓ Link | | 49.7 | 68.8 | | | | MeTRAbs | 2020-07-12 |
Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows | | | 49.8 | 80.2 | | | | FS+WS+OPT(KA+BA+S, 16 frames) | 2020-03-23 |
THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers | | | 51.5 | 74.8 | | | | THUNDR | 2021-06-17 |
Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation | ✓ Link | | 51.6 | | | | | EFT | 2020-04-07 |
UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning | | | 51.6 | | | | | UniHPE (GT) | 2023-11-24 |
Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation | | | 52.34 | 81.98 | | | | MION | 2021-12-24 |
SPEC: Seeing People in the Wild with an Estimated Camera | ✓ Link | | 53.2 | | | | | SPEC | 2021-10-01 |
HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation | ✓ Link | | 53.4 | 83.9 | | | | HuManiFlow | 2023-05-11 |
Hierarchical Kinematic Probability Distributions for 3D Human Shape and Pose Estimation from Images in the Wild | ✓ Link | | 53.6 | 84.9 | | | | Hierarchical Probabilistic Humans | 2021-10-03 |
BlanketGen - A synthetic blanket occlusion augmentation pipeline for MoCap datasets | | | 53.96 | | | | | BlanketGen | 2022-10-21 |
Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation | ✓ Link | | 54.4 | | | | | Hand4Whole | 2020-11-23 |
Learning 3D Human Shape and Pose from Dense Body Parts | ✓ Link | | 54.8 | 85.5 | | | | DaNet-DensePose2SMPL | 2019-12-31 |
Probabilistic Modeling for Human Mesh Recovery | ✓ Link | | 55.1 | | | | | ProHMR + fitting | 2021-08-26 |
On Self-Contact and Human Pose | ✓ Link | | 55.5 | 84.9 | | | | TUCH | 2021-04-07 |
Neural Descent for Visual 3D Human Pose and Shape | | | 57.5 | 81.4 | | | | HUND (FS+SS) | 2020-08-16 |
LASOR: Learning Accurate 3D Human Pose and Shape Via Synthetic Occlusion-Aware Data and Neural Mesh Rendering | ✓ Link | | 57.9 | | | | | LASOR | 2021-08-01 |
3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning | ✓ Link | | 58.98 | 96.36 | | | | RSC-Net | 2020-07-27 |
Beyond Weak Perspective for Monocular 3D Human Pose Estimation | | | 59.7 | 83.2 | | | | BeyondWeak | 2020-09-14 |
THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers | | | 59.9 | 86.8 | | | | THUNDR (WS) | 2021-06-17 |
Probabilistic Modeling for Human Mesh Recovery | ✓ Link | | 59.9 | | | | | Biggs [3] | 2021-08-26 |
FrankMocap: A Monocular 3D Whole-Body Pose Estimation System via Regression and Integration | ✓ Link | | 60 | 94.3 | | | | FrankMocap | 2021-08-13 |
Monocular Expressive Body Regression through Body-Driven Attention | ✓ Link | | 60.7 | 93.4 | | | | ExPose | 2020-08-20 |
Probabilistic 3D Human Shape and Pose Estimation from Multiple Unconstrained Images in the Wild | | | 61 | 90.9 | | | | Prob3DHumans | 2021-03-19 |
Dual networks based 3D Multi-Person Pose Estimation from Monocular Video | ✓ Link | | 61.7 | | | | | Dual network | 2022-05-02 |
Accurate 3D Body Shape Regression using Metric and Semantic Attributes | ✓ Link | | 62.6 | 95.2 | | | | SHAPY (SMPL-X) | 2022-06-14 |
PoseNet3D: Learning Temporally Consistent 3D Human Pose via Knowledge Distillation | ✓ Link | | 63.2 | | | | | PoseNet3D | 2020-03-07 |
Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos | ✓ Link | | 64.2 | | | | | GnTCN | 2020-12-22 |
Probabilistic Modeling for Human Mesh Recovery | ✓ Link | | 65 | | | | | ProHMR | 2021-08-26 |
UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning | | | 65.7 | | | | | UniHPE-w48 | 2023-11-24 |
Heuristic Weakly Supervised 3D Human Pose Estimation | ✓ Link | | 66.1 | | | | | HW-HuP | 2021-05-23 |
Synthetic Training for Accurate 3D Human Pose and Shape Estimation in the Wild | ✓ Link | | 66.8 | | | | | STRAPS | 2020-09-21 |
PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision | ✓ Link | | 69.5 | 115 | | | | PoseTriplet | 2022-03-29 |
3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training | | | 71.8 | | | | | Spatio-Temporal Network (T=128) | 2020-04-07 |
Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose Estimation | | | 72.1 | | | | | Non-Local Latent Relation Distillation | 2022-04-05 |
Learning 3D Human Dynamics from Video | ✓ Link | | 72.6 | 116.5 | 15.2 | | | HMMR (T=20) | 2018-12-04 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | | 73.2 | | | | | HR-Net+ST-GCN+PoseAug | 2021-05-06 |
Inference Stage Optimization for Cross-scenario 3D Human Pose Estimation | | | 75.8 | | | | | ISO | 2020-07-04 |
PONet: Robust 3D Human Pose Estimation via Learning Orientations Only | | | 76.2 | | | | | PONet | 2021-12-21 |
Multi-View Matching (MVM): Facilitating Multi-Person 3D Pose Estimation Learning with Action-Frozen People Video | | | 78.2 | | | | | MVM | 2020-04-11 |
Learning 3D Human Pose from Structure and Motion | ✓ Link | | 92.2 | | | | | TP-Net | 2017-11-25 |
Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image | ✓ Link | | 106.8 | | | | | SMPLify | 2016-07-27 |
A simple yet effective baseline for 3d human pose estimation | ✓ Link | | 157.0 | | | | | Simple-baseline | 2017-05-08 |
HybridCap: Inertia-aid Monocular Capture of Challenging Human Motions | | | | 72.1 | | | | HybridCap | 2022-03-17 |
Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos | ✓ Link | | | 74.6 | 8 | | | PARE + HANet (T=51) | 2022-11-29 |
SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation | ✓ Link | | | 75.2 | | | | SMPLer-X | 2023-09-29 |
Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos | ✓ Link | | | 77.1 | 6.8 | | | PARE + HANet (T=101) | 2022-11-29 |
MEEV: Body Mesh Estimation On Egocentric Video | ✓ Link | | | 81.74 | | | | MEEV | 2022-10-21 |
End-to-end Recovery of Human Shape and Pose | ✓ Link | | | 130.0 | 37.4 | | | HMR | 2017-12-18 |