ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation | ✓ Link | 81.1 | 95.0 | 88.2 | 86.0 | 77.8 | 85.6 | ViTPose (ViTAE-G, ensemble) | 2022-04-26 |
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation | ✓ Link | 80.9 | 94.8 | 88.1 | 85.9 | 77.5 | 85.4 | ViTPose (ViTAE-G) | 2022-04-26 |
Polarized Self-Attention: Towards High-quality Pixel-wise Regression | ✓ Link | 79.5 | 93.6 | 85.9 | 84.3 | 76.3 | 81.9 | UDP-Pose-PSA(384x288) | 2021-07-02 |
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation | ✓ Link | 79.5 | 91.9 | 85.8 | 86.5 | 75.9 | 84.5 | PoseBH-H | 2025-05-23 |
Learning Delicate Local Representations for Multi-Person Pose Estimation | ✓ Link | 79.2 | 94.4 | 87.1 | 76.1 | 83.8 | 84.1 | 4xRSN-50 (ensemble) | 2020-03-09 |
Towards High Performance Human Keypoint Detection | ✓ Link | 78.9 | 93.8 | 86 | 84.5 | 75 | 83.6 | CCM+ | 2020-02-03 |
Polarized Self-Attention: Towards High-quality Pixel-wise Regression | ✓ Link | 78.9 | 93.6 | 85.8 | 83.6 | 76.1 | 81.4 | UDP-Pose-PSA(256x192) | 2021-07-02 |
Learning Delicate Local Representations for Multi-Person Pose Estimation | ✓ Link | 78.6 | 94.3 | 86.6 | 75.5 | 83.3 | 83.8 | 4xRSN-50 | 2020-03-09 |
Human Pose as Compositional Tokens | ✓ Link | 78.3 | 9 | 85.9 | | | | PCT (256x256) | 2023-03-21 |
Distribution-Aware Coordinate Representation for Human Pose Estimation | ✓ Link | 77.4 | 92.6 | 84.6 | 83.7 | 73.6 | 82.3 | HRNet-W48+DARK | 2019-10-14 |
Revealing the Dark Secrets of Masked Image Modeling | ✓ Link | 77.2 | | | | | | SwinV2-L 1K-MIM | 2022-05-26 |
Deep High-Resolution Representation Learning for Human Pose Estimation | ✓ Link | 77 | 92.7 | 84.5 | 83.1 | 73.4 | 82 | HRNet-W48 + extra data | 2019-02-25 |
EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation using Accelerated Neuroevolution with Weight Transfer | ✓ Link | 76.8 | 92.5 | 84.3 | 82.5 | 73.5 | 81.7 | EvoPose2D-L | 2020-11-17 |
Revealing the Dark Secrets of Masked Image Modeling | ✓ Link | 76.7 | | | | | | SwinV2-B 1K-MIM | 2022-05-26 |
The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation | ✓ Link | 76.5 | 92.7 | 84 | 73.0 | 82.4 | 81.6 | HRNet-W48+UDP | 2019-11-18 |
OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation | ✓ Link | 76.4 | 92.6 | 83.7 | 82.6 | 72.6 | 81.2 | OmniPose (WASPv2) | 2021-03-18 |
HRFormer: High-Resolution Transformer for Dense Prediction | ✓ Link | 76.2 | 92.7 | 83.8 | 82.3 | 72.5 | 81.2 | HRFormer-B | 2021-10-18 |
Rethinking on Multi-Stage Networks for Human Pose Estimation | ✓ Link | 76.1 | 93.4 | 83.8 | 81.5 | 72.3 | 81.6 | MSPN | 2019-01-01 |
Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation | ✓ Link | 75.7 | 92.4 | 83.3 | 81.2 | 71.4 | 80.5 | MIPNet | 2021-01-27 |
Deep Multi-Task Networks For Occluded Pedestrian Pose Estimation | | 75.7 | 90.3 | 76.3 | 79.5 | 80.7 | | PPE (ResNeXt-101) | 2022-06-15 |
TransPose: Keypoint Localization via Transformer | ✓ Link | 75 | 92.2 | 82.3 | 81.1 | 71.3 | | TransPose-H-A6 | 2020-12-28 |
PoseFix: Model-agnostic General Human Pose Refinement Network | ✓ Link | 74.7 | 91.2 | 81.9 | 81.2 | 71.1 | 79.9 | PoseFix | 2018-12-10 |
DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation | | 74.6 | 91.9 | 82.1 | 80.6 | 71.3 | 79.9 | DPIT-L | 2022-09-02 |
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search | ✓ Link | 73.9 | 91.7 | 82 | 79.5 | 70.5 | 80.4 | S-ViPNAS-HRNetW32 | 2021-05-21 |
Simple Baselines for Human Pose Estimation and Tracking | ✓ Link | 73.7 | 91.9 | 81.1 | 80 | 70.3 | 79 | Flow-based (ResNet-152) | 2018-04-17 |
Cascaded Pyramid Network for Multi-Person Pose Estimation | ✓ Link | 73.0 | 91.7 | 80.9 | 78.1 | | 79.0 | CPN+ [6, 9] | 2017-11-20 |
RMPE: Regional Multi-person Pose Estimation | ✓ Link | 72.3 | 89.2 | 79.1 | 78.6 | 68.0 | | RMPE++ | 2016-12-01 |
TFPose: Direct Human Pose Estimation with Transformers | | 72.2 | 90.9 | 80.1 | 78.8 | 69.1 | | TFPose (ND=6 ResNet-50) | 2021-03-29 |
Cascaded Pyramid Network for Multi-Person Pose Estimation | ✓ Link | 72.1 | 91.4 | 80.0 | 77.2 | | 78.5 | CPN | 2017-11-20 |
[]() | | 70.8 | | | | | | LOGO-CAP (Ours) HRNet-W48 | |
Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation | ✓ Link | 70.6 | 90.8 | 78.2 | 76.1 | 67.4 | 76.4 | Dite-HRNet-30 | 2022-04-22 |
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation | ✓ Link | 70.3 | 91.2 | 77.8 | 76.8 | 66.3 | 77.7 | KAPAO-L | 2021-11-16 |
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search | ✓ Link | 70.3 | 90.7 | 78.8 | 75.5 | 67.3 | 77.3 | S-ViPNAS-Res50 | 2021-05-21 |
Lite-HRNet: A Lightweight High-Resolution Network | ✓ Link | 69.7 | 90.7 | 77.5 | 75.0 | 66.9 | 75.4 | Lite-HRNet-30 | 2021-04-13 |
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation | ✓ Link | 68.8 | 90.5 | 76.5 | 76 | 64.3 | 76.3 | KAPAO-M | 2021-11-16 |
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation | ✓ Link | 68.1 | | | 70.5 | 66.8 | 88.2 | Simple Pose | 2019-11-24 |
Lite-HRNet: A Lightweight High-Resolution Network | ✓ Link | 66.9 | 89.4 | 74.4 | 72.2 | 64.0 | 72.6 | Lite-HRNet-18 | 2021-04-13 |
Towards Accurate Multi-person Pose Estimation in the Wild | | 64.9 | 85.5 | 71.3 | 70.0 | | 69.7 | G-RMI | 2017-01-06 |
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | ✓ Link | 64.4 | 85.7 | 70.7 | 69.8 | 61.8 | | Faster R-CNN (ImageNet+300M) | 2017-07-10 |
OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields | ✓ Link | 64.2 | 86.2 | 70.1 | 68.8 | 61 | | OpenPose | 2018-12-18 |
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation | ✓ Link | 63.8 | 88.4 | 70.4 | 71.7 | 58.6 | 71.2 | KAPAO-S | 2021-11-16 |
DirectPose: Direct End-to-End Multi-Person Pose Estimation | ✓ Link | 63.3 | 86.7 | 69.4 | 71.2 | 57.8 | | DirectPose (ResNet-101) | 2019-11-18 |
Mask R-CNN | ✓ Link | 63.1 | 87.3 | 68.7 | 71.4 | | | Mask-RCNN | 2017-03-20 |
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields | ✓ Link | 61.8 | 84.9 | 67.5 | 68.2 | 57.1 | 66.5 | CMU-Pose | 2016-11-24 |
RMPE: Regional Multi-person Pose Estimation | ✓ Link | 61.8 | 83.7 | 69.8 | 67.6 | 58.6 | | RMPE | 2016-12-01 |
YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss | ✓ Link | | 90.3 | | | | | yolopose | 2022-04-14 |
On the Calibration of Human Pose Estimation | | - | - | - | - | - | - | CCNet (ViTPose) | 2023-11-28 |