Learnable human mesh triangulation for 3D human pose and shape estimation | | 17.59 | No | Multi-View | | | 11.33 | 23.7 | LMT R152 384x384 | 2022-08-24 |
Geometry-Biased Transformer for Robust Multi-View 3D Human Pose Reconstruction | | 26.0 | No | Multi-View | | | | | Geometry-Biased Transformer (HRNet) | 2023-12-28 |
Epipolar Transformers | ✓ Link | 26.9 | No | Multi-View | | | | | Epipolar Transformer+R50 256×256+RPSM | 2020-05-10 |
Adaptive Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation | | 28.5 | No | Multi-View | | | | | MTF-Transformer (M=0.4, T=7) | 2021-10-11 |
Generalizable Human Pose Triangulation | | 29.1 | No | Multi-View | | | | | Generalizable Human Pose Triangulation | 2021-10-01 |
Adaptive Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation | | 29.4 | No | Multi-View | | | | | MTF-Transformer (M=0.4, T=1) | 2021-10-11 |
Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors | ✓ Link | 29.8 | No | Multi-View | | | | | SmartEdgeSensor | 2021-06-28 |
Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled Representation | | 30.2 | No | Multi-View | | | | | LWCDR | 2020-04-05 |
Learnable human mesh triangulation for 3D human pose and shape estimation | | 30.56 | No | Multi-View | | | 14.61 | 42.28 | LMT R50 224x224 | 2022-08-24 |
FLEX: Extrinsic Parameters-free Multi-view 3D Human Motion Reconstruction | ✓ Link | 30.9 | No | Multi-View | | | | | FLEX | 2021-05-05 |
Cross View Fusion for 3D Human Pose Estimation | ✓ Link | 31.17 | No | Multi-View | | | | | Fusion-RPSM (t=10) | 2019-09-03 |
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | ✓ Link | 33.0 | No | Monocular | 26.2 | | | | KTPFormer (T=243) | 2024-03-31 |
Differentiable Dynamics for Articulated 3d Human Motion Reconstruction | | 33.4 | No | Monocular | 21.9 | | | | DiffPhy (W=480) | 2022-05-24 |
PoseRN: A 2D pose refinement network for bias-free multi-view 3D human pose estimation | | 38.4 | No | Multi-View | | | | | PoseRN | 2021-07-07 |
SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation | ✓ Link | 38.9 | No | Monocular | 29.9 | | | | SoloPose | 2023-12-15 |
Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates | ✓ Link | 39 | No | Multi-View | | | | | Pose Consensus (multi-view, GT calib.) | 2019-11-21 |
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | ✓ Link | 39.8 | No | Monocular | | | | | MixSTE (HRNet, T=243) | 2022-03-02 |
3D Human Pose Estimation using Spatio-Temporal Networks with Explicit Occlusion Training | | 40.1 | No | Monocular | 30.7 | | | | Spatio-Temporal Network (T=128) | 2020-04-07 |
IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation | | 40.2 | No | Monocular | | | | | IVT (f=5) | 2022-08-06 |
Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos | ✓ Link | 40.9 | No | Monocular | 30.4 | | | | GnTCN | 2020-12-22 |
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | ✓ Link | 40.9 | No | Monocular | | | | | MixSTE (CPN, T=243) | 2022-03-02 |
Conditional Directed Graph Convolution for 3D Human Pose Estimation | ✓ Link | 41.1 | No | Monocular | | | | | U-CondDGConv | 2021-07-16 |
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation | ✓ Link | 42.1 | No | Monocular | 34.4 | | | | P-STMO (N=243) | 2022-03-15 |
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video | ✓ Link | 42.4 | No | Monocular | | | | | MixSTE (CPN, T=81) | 2022-03-02 |
Motion Guided 3D Pose Estimation from Videos | ✓ Link | 42.6 | No | Monocular | | | | | UGCN (HR-Net) | 2020-04-29 |
Occlusion-Aware Networks for 3D Human Pose Estimation in Video | | 42.9 | No | Monocular | | | | | Occlusion-Aware Networks | 2019-10-01 |
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation | ✓ Link | 43 | No | Monocular | | | | | MHFormer | 2021-11-24 |
ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention | ✓ Link | 43.2 | No | Monocular | | | | | ConvFormer (T=243, CPN) | 2023-04-04 |
Context Modeling in 3D Human Pose Estimation: A Unified Perspective | ✓ Link | 43.4 | No | Monocular | | | | | ContextPose | 2021-03-29 |
CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation | ✓ Link | 43.7 | No | Monocular | | | | | CrossFormer (T=81) | 2022-03-24 |
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation | ✓ Link | 43.7 | No | Monocular | | | | | StridedTransformer (T=351) | 2021-03-26 |
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation | ✓ Link | 44 | No | Monocular | | | | | StridedTransformer (T=243) | 2021-03-26 |
Anatomy-aware 3D Human Pose Estimation with Bone-based Pose Decomposition | ✓ Link | 44.1 | No | Monocular | | | | | Anatomy3D | 2020-02-24 |
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation | ✓ Link | 44.1 | No | Monocular | | | | | P-STMO-S (N=81) | 2022-03-15 |
3D Human Pose Estimation with Spatial and Temporal Transformers | ✓ Link | 44.3 | No | Monocular | | | | | PoseFormer (f=81) | 2021-03-18 |
Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation | ✓ Link | 44.3 | No | Monocular | | | | | RIE (T=243 CPN) | 2021-07-29 |
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows | ✓ Link | 44.3 | No | Monocular | | | | | Probabilistic Monocular (T=200) | 2021-07-29 |
Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images | | 44.4 | No | Multi-View | | | | | Shape-aware SMPL | 2019-08-26 |
TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking | | 44.6 | No | Monocular | | | | | TesseTrack (Monocular) | 2021-06-16 |
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach | ✓ Link | 44.8 | No | Monocular | | | | | SRNet (T=243) | 2020-07-18 |
Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions | | 44.8 | No | Monocular | | | | | Attention (T=243 CPN) | 2021-03-04 |
Motion Projection Consistency Based 3D Human Pose Estimation with Virtual Bones from Monocular Videos | | 44.8 | No | Monocular | | | | | Virtual Bones (T=243 CPN) | 2021-06-28 |
Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates | ✓ Link | 45 | No | Multi-View | | | | | Pose Consensus (multi-view, est. calib.) | 2019-11-21 |
HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation | | 45.1 | No | Monocular | | | | | HEMlets Pose | 2019-10-26 |
Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction | ✓ Link | 45.1 | No | Multi-View | | | | | Attention3DHumanPose (T=243 CPN) | 2020-06-01 |
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation | ✓ Link | 45.4 | No | Monocular | | | | | StridedTransformer (T=81) | 2021-03-26 |
Double-chain Constraints for 3D Human Pose Estimation in Images and Videos | ✓ Link | 46.1 | No | Monocular | | | | | DC-GCT(T=1) | 2023-08-10 |
Trajectory Space Factorization for Deep Video-Based 3D Human Pose Estimation | ✓ Link | 46.6 | No | Monocular | | | | | Trajectory Space Factorization (50 frames) | 2019-08-22 |
3D human pose estimation in video with temporal convolutions and semi-supervised training | ✓ Link | 46.8 | No | Monocular | 36.5 | | | | VideoPose3D (T=243) | 2018-11-28 |
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation | ✓ Link | 46.9 | No | Monocular | | | | | StridedTransformer (T=27) | 2021-03-26 |
Motion Projection Consistency Based 3D Human Pose Estimation with Virtual Bones from Monocular Videos | | 47.4 | No | Monocular | | | | | Virtual Bones (T=9 CPN) | 2021-06-28 |
Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation | | 47.9 | No | Monocular | | | | | Skeletal GNN | 2021-08-16 |
GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation | ✓ Link | 48 | No | Monocular | | | | | GraphMLP | 2022-06-13 |
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks | ✓ Link | 48.8 | No | Monocular | | | | | STRGCN (T=7) | 2019-10-01 |
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks | ✓ Link | 49.1 | No | Monocular | | | | | STRGCN (T=3) | 2019-10-01 |
Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation | ✓ Link | 49.2 | No | Monocular | | | | | DiffPyramid (CPN) | 2025-06-03 |
Dual networks based 3D Multi-Person Pose Estimation from Monocular Video | ✓ Link | 49.31 | No | Monocular | | | | | Dual network | 2022-05-02 |
Modulated Graph Convolutional Network for 3D Human Pose Estimation | ✓ Link | 49.4 | No | Monocular | | | | | Modulated-GCN | 2021-01-01 |
Adaptive Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation | | 49.4 | No | Monocular | | | | | MTF-Transformer (M=0.4, T=7, N=1) | 2021-10-11 |
Learning Pyramid-structured Long-range Dependencies for 3D Human Pose Estimation | ✓ Link | 49.5 | No | Monocular | | | | | PGFormer (CPN) | 2025-06-03 |
Generating Multiple Hypotheses for 3D Human Pose Estimation with Mixture Density Network | ✓ Link | 49.6 | No | Multi-View | | | | | MDN (Multi-View) | 2019-04-11 |
Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization | ✓ Link | 49.7 | No | Monocular | | | | | Ray3D (T=9 CPN) | 2022-03-22 |
SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach | ✓ Link | 49.9 | No | Monocular | | | | | SRNet (T=1) | 2020-07-18 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | 50.2 | No | Monocular | | | | | HR-Net+VPose+PoseAug | 2021-05-06 |
Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation | ✓ Link | 50.5 | No | Monocular | | | | | Jointformer (CPN) | 2022-08-07 |
Learning Temporal 3D Human Pose Estimation with Pseudo-Labels | ✓ Link | 50.6 | No | Multi-View | | | | | Multi-view Temporal self-supervised | 2021-10-14 |
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks | ✓ Link | 50.6 | No | Monocular | | | | | STRGCN (T=1) | 2019-10-01 |
Adaptive Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation | | 50.7 | No | Monocular | | | | | MTF-Transformer (M=0.4, T=1, N=1) | 2021-10-11 |
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation | ✓ Link | 50.8 | No | Monocular | | | | | HR-Net+ST-GCN+PoseAug | 2021-05-06 |
Cascaded deep monocular 3D human pose estimation with evolutionary training data | ✓ Link | 50.9 | No | Monocular | | | | | TAG-Net | 2020-06-14 |
Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation | ✓ Link | 51.1 | No | Monocular | 43.4 | | | | LoCO | 2020-04-01 |
3D human pose estimation in video with temporal convolutions and semi-supervised training | ✓ Link | 51.8 | No | Monocular | 40 | | | | VideoPose3D (T=1) | 2018-11-28 |
Graph Stacked Hourglass Networks for 3D Human Pose Estimation | ✓ Link | 51.9 | No | Monocular | | | | | Graph Stacked Hourglass Network (CPN) | 2021-03-30 |
Consensus-based Optimization for 3D Human Pose Estimation in Camera Coordinates | ✓ Link | 52 | No | Monocular | | | | | Pose Consensus (monocular) | 2019-11-21 |
3D Human Pose Estimation Using Möbius Graph Convolutional Networks | | 52.1 | No | Monocular | | | | | MöbiusGCN | 2022-03-20 |
PoseLifter: Absolute 3D human pose lifting network from a single noisy 2D human pose | ✓ Link | 52.5 | No | Monocular | 39.1 | | | | PoseLifter | 2019-10-26 |
Generating Multiple Hypotheses for 3D Human Pose Estimation with Mixture Density Network | ✓ Link | 52.7 | No | Monocular | 42.6 | | | | MDN | 2019-04-11 |
Optimizing Network Structure for 3D Human Pose Estimation | | 52.7 | No | Monocular | | | | | ONS LCN | 2019-10-01 |
Semantic Graph Convolutional Networks for 3D Human Pose Regression | ✓ Link | 57.6 | No | Monocular | | | | | SemGCN | 2019-04-06 |
Generalizing Monocular 3D Human Pose Estimation in the Wild | ✓ Link | 58 | No | Multi-View | | | | | Stereoscopic View Synthesis Subnetwork | 2019-04-11 |
Exploiting temporal information for 3D pose estimation | ✓ Link | 58.5 | No | Monocular | | | | | Sequence-to-sequence network | 2017-11-23 |
TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation | ✓ Link | 60 | No | Monocular | 39.5 | 6.5 | | | TAPE (T=16) | 2023-04-29 |
Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows | ✓ Link | 61.8 | No | Monocular | | | | | Probabilistic Monocular (T=1) | 2021-07-29 |
Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry | ✓ Link | 62.0 | No | Multi-View | | | | | 2D-3D Lifting self-supervised | 2021-08-17 |
A simple yet effective baseline for 3d human pose estimation | ✓ Link | 62.9 | No | Monocular | | | | | SIM (SH detections FT) (MA) | 2017-05-08 |
VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation | ✓ Link | 64.3 | No | Multi-View | | | | | VoxelKeypointFusion (transfer) | 2024-10-24 |
VIBE: Video Inference for Human Body Pose and Shape Estimation | ✓ Link | 65.6 | No | Monocular | 41.4 | | | | VIBE | 2019-12-11 |
CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild | ✓ Link | 74.3 | No | MultiView | | | | | CanonPose | 2020-11-30 |