A Dense-Sparse Complementary Network for Human Action Recognition based on RGB and Skeleton Modalities | ✓ Link | 96.7 | 95.6 | DSCNet (RGB + Pose) | 2023-12-28 |
Revisiting Skeleton-based Action Recognition | ✓ Link | 96.4 | 95.3 | PoseC3D (RGB + Pose) | 2021-04-28 |
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living | ✓ Link | 96.1 | 95.1 | π-ViT (RGB + Pose) | 2023-11-30 |
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos | ✓ Link | 94.4 | 92.9 | MMNet (RGB + Pose) | 2022-05-26 |
Explore Human Parsing Modality for Action Recognition | ✓ Link | 92.8 | 91.1 | EPP-Net (Parsing + Pose) | 2024-01-04 |
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition | | 92.7 | 90.3 | STAR-Transformer (RGB + Pose) | 2022-10-14 |
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition | ✓ Link | 92.4 | 94.3 | EPAM-Net | 2024-08-10 |
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living | ✓ Link | 91.9 | 92.9 | π-ViT (RGB only) | 2023-11-30 |
Integrating Human Parsing and Pose Network for Human Action Recognition | ✓ Link | 91.7 | 90.0 | IPP-Net (Parsing + Pose) | 2023-07-16 |
Cross-Modal Learning with 3D Deformable Attention for Action Recognition | | 91.4 | 90.5 | 3DA (RGB + Pose) | 2022-12-12 |
Joint-Partition Group Attention for skeleton-based action recognition | ✓ Link | 91.4 | 89.4 | JPFormer(Pose) | 2024-07-30 |
DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling | ✓ Link | 90.97 | 89.12 | DSTSA-GCN | 2025-01-21 |
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living | ✓ Link | 90.7 | 92.5 | VPN++ (RGB + Pose) | 2021-05-17 |
DVANet: Disentangling View and Action Features for Multi-View Action Recognition | ✓ Link | 90.4 | 91.6 | DVANet (RGB only) | 2023-12-10 |
Multi-View Action Recognition Using Contrastive Learning | ✓ Link | 87.5 | 85.6 | ViewCon (RGB) | 2023-01-03 |
VPN: Learning Video-Pose Embedding for Activities of Daily Living | ✓ Link | 86.3 | 87.8 | VPN (RGB + Pose) | 2020-07-06 |
Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition | | 78.3 | 79.2 | ST-GCN + AS-GCN w/DH-TCN | 2019-12-20 |
Gimme Signals: Discriminative signal encoding for multimodal activity recognition | ✓ Link | 70.8 | 71.59 | Gimme Signals (AIS) | 2020-03-13 |
Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints | ✓ Link | 67.9 | 62.8 | TSRJI | 2019-09-11 |
SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition | ✓ Link | 66.9 | 67.7 | Skelemotion + Yang et al. (skeleton only) | 2019-07-30 |
Recognizing Human Actions as the Evolution of Pose Estimation Maps | | 64.6 | 66.9 | Body Pose Evolution Map | 2018-06-01 |