TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | ✓ Link | 15 | 87.7 | 99.0 | | | | TCPFormer (T=81) | 2025-01-03 |
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | ✓ Link | 16.2 | 85.3 | 98.2 | | | | MotionAGFormer-L (T=81) | 2023-10-25 |
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | ✓ Link | 16.7 | 85.9 | 98.9 | | | | KTPFormer | 2024-03-31 |
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | ✓ Link | 17.1 | 84.5 | 98.3 | | | | MotionAGFormer-S (T=81) | 2023-10-25 |
TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation | ✓ Link | 17.8 | 86.5 | 98.7 | | | | TCPFormer (T=27) | 2025-01-03 |
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | ✓ Link | 18.2 | 84.2 | 98.3 | | | | MotionAGFormer-B (T=81) | 2023-10-25 |
MotionAGFormer: Enhancing 3D Human Pose Estimation with a Transformer-GCNFormer Network | ✓ Link | 19.2 | 83.5 | 98.2 | | | | MotionAGFormer-XS (T=27) | 2023-10-25 |
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention | ✓ Link | 23.1 | 83.9 | 98.7 | | | | STCFormer (T=81) | 2023-01-01 |
GLA-GCN: Global-local Adaptive Graph Convolutional Network for 3D Human Pose Estimation from Monocular Video | ✓ Link | 27.76 | 79.12 | 98.53 | | | | GLA-GCN (T=81) | 2023-07-12 |
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation | ✓ Link | 27.8 | 78.8 | 97.9 | | | | PoseFormerV2 (T=81) | 2023-03-30 |
(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network Based On Transformer for 3D Human Pose Estimation | | 28.2 | 70 | 97.9 | | | | Fusionformer (f=9) | 2022-10-08 |
HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation | | 28.3 | 78.6 | 98 | | | | HSTFormer (T=81) | 2023-01-18 |
DiffPose: Toward More Reliable 3D Pose Estimation | ✓ Link | 29.1 | 75.9 | 98 | | | | DiffPose | 2022-11-30 |
Diffusion-Based 3D Human Pose Estimation with Multi-Hypothesis Aggregation | ✓ Link | 29.7 | 78.2 | 97.7 | | | | D3DP (N=243, H=20, K=20, J-Agg) | 2023-03-21 |
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation | ✓ Link | 32.2 | 75.8 | 97.9 | | | | P-STMO (N=81) | 2022-03-15 |
Learnable human mesh triangulation for 3D human pose and shape estimation | | 33.7 | 77.09 | 99.37 | | | | LMT R152 384x384 | 2022-08-24 |
Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation | ✓ Link | 40.5 | 74.2 | 98.8 | | | | RTPCA | 2023-09-04 |
Conditional Directed Graph Convolution for 3D Human Pose Estimation | ✓ Link | 42.5 | 69.5 | 97.9 | | | | U-CondDGConv | 2021-07-16 |
Learnable human mesh triangulation for 3D human pose and shape estimation | | 45.87 | 71.57 | 96.59 | | | | (R50-224) LMT | 2022-08-24 |
Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization | ✓ Link | 46.6 | | | | | | Ray3D (T=9 CPN H36M+HEva+3DHP) | 2022-03-22 |
ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention | ✓ Link | 53.6 | 69.8 | 96.4 | | | | ConvFormer | 2023-04-04 |
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | ✓ Link | 54.9 | 66.5 | 94.4 | | | | MixSTE (T=27) | 2022-03-02 |
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation | ✓ Link | 55.2 | 65.6 | 93 | | | | ZeDO (S=50) | 2023-07-07 |
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | ✓ Link | 57.9 | 63.8 | 94.2 | | | | MixSTE (T=1) | 2022-03-02 |
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation | ✓ Link | 58 | 63.3 | 93.8 | | | | MHFormer | 2021-11-24 |
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation | ✓ Link | 61.3 | 62.5 | 92.1 | | | | PoseDA | 2023-03-29 |
PLIKS: A Pseudo-Linear Inverse Kinematic Solver for 3D Human Body Estimation | ✓ Link | 67.6 | 54.1 | | 93.9 | | | PLIKS (HR48) | 2022-11-21 |
Motion Guided 3D Pose Estimation from Videos | ✓ Link | 68.1 | 62.1 | 86.9 | | | | UGCN | 2020-04-29 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | 71.1 | 57.9 | 89.2 | | | | PoseAug (+Extra2D) | 2021-05-06 |
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos | ✓ Link | 71.8 | | | | 53 | 7.4 | ARTS (Resnet50 L=16) | 2024-10-21 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | 73 | 57.3 | 88.6 | | | | VPose+PoseAug | 2021-05-06 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | 73.2 | | | | | | HR-Net+VPose+PoseAug | 2021-05-06 |
Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos | | 76 | 53.8 | | | | | DG-Net (T=4) | 2021-09-15 |
CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation | ✓ Link | 76.3 | 57.5 | 89.1 | | | | CrossFormer | 2022-03-24 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | 76.6 | | | | | | HR-Net+ST-GCN+PoseAug | 2021-05-06 |
3D Human Pose Estimation with Spatial and Temporal Transformers | ✓ Link | 77.1 | 56.4 | 88.6 | | | | PoseFormer (9 frames) | 2021-03-18 |
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition | ✓ Link | 78.8 | 54 | 87.9 | | | | Anatomy3D (T=81) | 2020-02-24 |
PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning | | 79.1 | 55.1 | 86.3 | | | | PoseGU | 2022-07-07 |
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition | ✓ Link | 79.1 | 53.8 | 87.8 | | | | Anatomy3D (T=243) | 2020-02-24 |
PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision | ✓ Link | 79.5 | 53.1 | 89.1 | | | | PoseTriplet | 2022-03-29 |
Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation | ✓ Link | 79.8 | 51.4 | 83.6 | | | | Trajectory Space Factorization (F=25) | 2019-08-22 |
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation | ✓ Link | 83.6 | | | | 56.2 | | MAED | 2021-09-06 |
Temporal-Aware Refinement for Video-based Human Pose and Shape Recovery | | 85.9 | | | | 60.5 | 9.2 | TAR (N=9) | 2023-11-16 |
3D Human Pose and Shape Estimation via HybrIK-Transformer | ✓ Link | 86.2 | 48.9 | 88.6 | | | | HybrIK-Transformer (HrNet-48) | 2023-02-09 |
IKOL: Inverse kinematics optimization layer for 3D human pose and shape estimation via Gauss-Newton differentiation | ✓ Link | 88.8 | 48.1 | 87.9 | | | | IKOL | 2023-02-02 |
HybrIK-X: Hybrid Analytical-Neural Inverse Kinematics for Whole-body Mesh Recovery | ✓ Link | 91 | 47.3 | 87.1 | | | | HybrIK (HRNet-W48) | 2023-04-12 |
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation | ✓ Link | 91.0 | 46.9 | 87.5 | | | | HybrIK | 2020-11-30 |
3D Human Pose Estimation with 2D Marginal Heatmaps | ✓ Link | 91.3 | 47 | 85.4 | | | | MargiPose (multi-crop) | 2018-06-05 |
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation | ✓ Link | 92.5 | 54.8 | 81.8 | | | | RepNet (H36M) | 2019-02-26 |
Learning Temporal 3D Human Pose Estimation with Pseudo-Labels | ✓ Link | 93.0 | 50.1 | 81.0 | | | | Multi-view Temporal self-supervised | 2021-10-14 |
Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation | | 93.4 | | | | 62.2 | 11.9 | DST-VIBE | 2021-10-22 |
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation | ✓ Link | 93.9 | | | | 61.5 | 7.9 | GLoT | 2023-03-26 |
TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation | ✓ Link | 94 | | | | 56.7 | 12.4 | TAPE (T=16) | 2023-04-29 |
Self-Attentive 3D Human Pose and Shape Estimation from Videos | | 94.3 | | 90.1 | | 60.7 | | Self-Attentive | 2021-03-26 |
Learning Local Recurrent Models for Human Mesh Recovery | | 94.6 | | | | 62.4 | | LMR | 2021-07-27 |
Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from Videos | | 95.3 | | | | 61.6 | 8.4 | STR | 2022-10-07 |
Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation | ✓ Link | 96.2 | | | | 63.1 | 16.7 | TePose (T=6 3DPW) | 2022-07-25 |
VIBE: Video Inference for Human Body Pose and Shape Estimation | ✓ Link | 96.6 | | 89.3 | | 64.6 | | VIBE | 2019-12-11 |
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video | | 96.7 | | | | 62.8 | 9.6 | MPS-Net (T=16) | 2022-03-16 |
SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human Pose Regression | | 96.8 | 44.3 | 83.2 | | | | SSP-Net | 2020-09-04 |
DC-GNet: Deep Mesh Relation Capturing Graph Convolution Network for 3D Human Shape Reconstruction | | 97.2 | 40.7 | | | 62.5 | | DC-GNet | 2021-08-27 |
Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video | ✓ Link | 97.3 | | | | 63.5 | 8.5 | TCMR (T=16 w/o H3.6M) | 2020-11-17 |
Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video | ✓ Link | 97.4 | | | | 62.8 | 8 | TCMR (T=16 w/o 3DPW) | 2020-11-17 |
Leveraging MoCap Data for Human Mesh Recovery | ✓ Link | 97.4 | | | | 63.3 | 8.7 | MoCap-SPIN + PoseBERT | 2021-10-18 |
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation | ✓ Link | 97.8 | 58.5 | 82.5 | | | | RepNet (3DHP) | 2019-02-26 |
DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos | | 97.8 | | | | 65.4 | 8.2 | DDT | 2023-03-23 |
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera | ✓ Link | 98.4 | 45.3 | 82.8 | | | | XNect (SelecSLS) | 2019-07-01 |
Cascaded deep monocular 3D human pose estimation with evolutionary training data | ✓ Link | 99.7 | 46.1 | 81.2 | | | | EvoSkeleton | 2020-06-14 |
Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation | ✓ Link | 101.5 | 43.1 | 79.5 | | 66.1 | | DynaBOA | 2021-11-07 |
3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning | ✓ Link | 103.36 | | | | 70.01 | | RSC-Net | 2020-07-27 |
CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild | ✓ Link | 104 | | 77 | | | | CanonPose | 2020-11-30 |
Learning to Regress Bodies from Images using Differentiable Semantic Rendering | ✓ Link | 104.7 | | | | 66.7 | | DSR | 2021-10-07 |
Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop | ✓ Link | 105.2 | 37.1 | 76.4 | | | | SPIN | 2019-09-27 |
Self-Supervised Learning of 3D Human Pose using Multi-view Geometry | ✓ Link | 108.99 | | 77.5 | | | | EpipolarPose (fully-supervised) | 2019-03-06 |
XFormer: Fast and Accurate Monocular 3D Body Capture | | 109.8 | | | | 64.5 | | XFormer (HRNet) | 2023-05-18 |
Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild | | 110.8 | | 80.2 | | | | GeoRep (fully-supervised) | 2020-03-17 |
Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates | ✓ Link | 112.1 | 42.1 | 80.6 | | | | Pose Consensus (monocular) | 2019-11-21 |
PONet: Robust 3D Human Pose Estimation via Learning Orientations Only | | 115.0 | 40.6 | 76.1 | | | | PONet | 2021-12-21 |
Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision | | 117.6 | 39.3 | 75.7 | | | | Mehta | 2016-11-29 |
Single-Shot Multi-Person 3D Pose Estimation From Monocular RGB | ✓ Link | 122.2 | 37.8 | 75.2 | | | | Single-Shot Multi-Person | 2017-12-09 |
End-to-end Recovery of Human Shape and Pose | ✓ Link | 124.2 | 36.5 | 72.9 | | 89.8 | | HMR | 2017-12-18 |
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera | ✓ Link | 124.7 | 40.4 | 76.6 | | | | VNect (Augm.) | 2017-05-03 |
3D Human Pose Estimation via Explicit Compositional Depth Maps | | | 62.4 | 93.2 | | | | Explicit Compositional Depth Maps | 2020-02-08 |
Double-chain Constraints for 3D Human Pose Estimation in Images and Videos | ✓ Link | | 55.9 | 87.5 | | | | DC-GCT | 2023-08-10 |
Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop | ✓ Link | | 55.6 | 92.5 | | 67.5 | | SPIN (Rigid Alignment) | 2019-09-27 |
HTNet: Human Topology Aware Network for 3D Human Pose Estimation | ✓ Link | | 54.1 | 86.7 | | | | HTNet | 2023-02-20 |
Modulated Graph Convolutional Network for 3D Human Pose Estimation | ✓ Link | | 53.7 | 86.1 | | | | Modulated-GCN | 2021-01-01 |
Regular Splitting Graph Network for 3D Human Pose Estimation | ✓ Link | | 53.2 | 85.6 | | | | Regular Splitting Graph Network | 2023-05-09 |
Hierarchical Graph Networks for 3D Human Pose Estimation | ✓ Link | | 52.1 | 85.2 | | | | HGN | 2021-11-23 |
"Teaching Independent Parts Separately" (TIPSy-GAN) : Improving Accuracy and Stability in Unsupervised Adversarial 2D to 3D Pose Estimation | | | 48.8 | 78 | | | | TIPSy-GAN (GT) | 2022-05-12 |
Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation | | | 46.2 | 82.1 | | | | Skeletal GNN | 2021-08-16 |
Graph Stacked Hourglass Networks for 3D Human Pose Estimation | ✓ Link | | 45.8 | 80.1 | | | | Graph Stacked Hourglass Network | 2021-03-30 |
PoseLifter: Absolute 3D human pose lifting network from a single noisy 2D human pose | ✓ Link | | 45.1 | 84 | | | | PoseLifter | 2019-10-26 |
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach | ✓ Link | | 43.8 | 77.6 | | | | SRNET | 2020-07-18 |
Context Modeling in 3D Human Pose Estimation: A Unified Perspective | ✓ Link | | 42.7 | 80.5 | | | | ContextPose | 2021-03-29 |
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera | ✓ Link | | 41.6 | 79.4 | | | | VNect (ResNet 50 GT) | 2017-05-03 |
Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision | | | 40.8 | 64.7 | | | | Mehta | 2016-11-29 |
HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation | | | 38 | 75.3 | | | | HEMlets Pose | 2019-10-26 |
Unsupervised 3D Pose Estimation with Geometric Self-Supervision | | | 36.3 | 71.1 | | | | 2D-3D Lifting Network | 2019-04-09 |
Ordinal Depth Supervision for 3D Human Pose Estimation | ✓ Link | | 35.3 | 71.9 | | | | Ordinal Depth Supervision | 2018-05-10 |
Generalizing Monocular 3D Human Pose Estimation in the Wild | ✓ Link | | 33.8 | 71.2 | | | | Stereoscopic View Synthesis Subnetwork | 2019-04-11 |
OriNet: A Fully Convolutional Network for 3D Human Pose Estimation | ✓ Link | | 32.1 | 64.6 | | | | OriNet | 2018-11-12 |
3D Human Pose Estimation in the Wild by Adversarial Learning | | | 32.0 | 69.0 | | | | Adversarial Learning | 2018-03-26 |
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows | ✓ Link | | | 84.3 | | | | Probabilistic Monocular | 2021-07-29 |
3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training | | | | 84.1 | | | | Spatio-Temporal Network (T=128) | 2020-04-07 |
Weakly Supervised Generative Network for Multiple 3D Human Pose Hypotheses | ✓ Link | | | 79.3 | | | | WSGAN | 2020-08-13 |
Generating Multiple Hypotheses for 3D Human Pose Estimation with Mixture Density Network | ✓ Link | | | 67.9 | | | | MDM | 2019-04-11 |
Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation | ✓ Link | | | | | 67.5 | | EFT | 2020-04-07 |