RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation | ✓ Link | 83.8 | 88.8 | 84.7 | 77.2 | 52.4 | RTMO-l | 2023-12-12 |
Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity | ✓ Link | 78.5 | 83.9 | 79.0 | 72.3 | | BUCTD-W48 (w/cond. input from PETR, and generative sampling) | 2023-06-13 |
I^2R-Net: Intra- and Inter-Human Relation Network for Multi-Person Pose Estimation | ✓ Link | 77.4 | 83.8 | 78.1 | 69.3 | | I²R-Net (1st stage: HRFormer-B) | 2022-06-22 |
Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation | ✓ Link | 76.6 | 83.0 | 77.3 | 68.3 | | ED-Pose (Swin-L) | 2023-02-03 |
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | ✓ Link | 75.1 | 81.3 | 75.7 | 68.1 | | DETRPose-X | 2025-06-16 |
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | ✓ Link | 73.3 | 79.5 | 74.0 | 66.1 | | DETRPose-L | 2025-06-16 |
HRFormer: High-Resolution Transformer for Dense Prediction | ✓ Link | 72.4 | 80.0 | 73.5 | 62.4 | | HRFormer-B | 2021-10-18 |
BAPose: Bottom-Up Pose Estimation with Disentangled Waterfall Representations | ✓ Link | 72.2 | 79.9 | 73.4 | 61.3 | | BAPose (W32) | 2021-12-20 |
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | ✓ Link | 72.0 | 78.6 | 72.6 | 64.5 | | DETRPose-M | 2025-06-16 |
TransPose: Keypoint Localization via Transformer | ✓ Link | 71.8 | 79.5 | 72.9 | 62.2 | | TransPose-H | 2020-12-28 |
Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation | | 71.5 | | 72.2 | | | SCIO (HRNet-48) | 2022-07-06 |
ScaleNAS: One-Shot Learning of Scale-Aware Representations for Visual Recognition | | 71.3 | | | | | HigherHRNet (ScaleNet_P4) | 2020-11-30 |
Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation | ✓ Link | 70.0 | 78.1 | 71.1 | 59.4 | | MIPNet (HRNet-W48) | 2021-01-27 |
The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation | ✓ Link | 69.4 | 76.6 | 70.0 | 61.5 | | CenterGroup | 2021-10-11 |
HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation | ✓ Link | 67.6 | 75.8 | 68.1 | 58.9 | - | HigherHRNet(HR-Net-48) | 2019-08-27 |
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | ✓ Link | 67.4 | 74.7 | 68.1 | 59.3 | | DETRPose-S | 2025-06-16 |
CrowdPose: Efficient Crowded Scenes Pose Estimation and A New Benchmark | ✓ Link | 66.0 | 75.5 | 66.3 | 57.4 | 10.1 | Joint-candidate SPPE + | 2018-12-02 |
Human Pose Estimation for Real-World Crowded Scenarios | ✓ Link | 65.5 | 75.2 | 66.6 | 53.1 | | OccNet | 2019-07-16 |
Greedy Offset-Guided Keypoint Grouping for Human Pose Estimation | ✓ Link | 65.2 | 73.8 | 66.2 | 54.8 | 14.7 (21.4) | Hourglass-104 | 2021-07-07 |
Single-Stage Multi-Person Pose Machines | ✓ Link | 63.7 | 70.3 | 64.5 | 55.7 | | SPM | 2019-08-24 |
RMPE: Regional Multi-person Pose Estimation | ✓ Link | 61.0 | 71.2 | 61.4 | 51.1 | | AlphaPose | 2016-12-01 |
Simple Baselines for Human Pose Estimation and Tracking | ✓ Link | 60.8 | 71.4 | 61.2 | 51.2 | | Simple baseline | 2018-04-17 |
Monocular, One-stage, Regression of Multiple 3D People | ✓ Link | 58.6 | | | | | ROMP+CAR | 2020-08-27 |
Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation | ✓ Link | 58.3 | | | | | LitePose-S | 2022-05-03 |
Mask R-CNN | ✓ Link | 57.2 | 69.4 | 57.9 | 45.8 | | Mask R-CNN | 2017-03-20 |
DETRPose: Real-time end-to-end transformer model for multi-person pose estimation | ✓ Link | 56.0 | 65.0 | 56,6 | 46,6 | | DETRPose-N | 2025-06-16 |
Monocular, One-stage, Regression of Multiple 3D People | ✓ Link | 55.6 | | | | | ROMP | 2020-08-27 |
OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields | ✓ Link | | 62.7 | 58.7 | 32.3 | | OpenPose | 2018-12-18 |