A Dense-Sparse Complementary Network for Human Action Recognition based on RGB and Skeleton Modalities | ✓ Link | 97.4 | 99.4 | DSCNet (RGB + Pose) | 2023-12-28 |
Revisiting Skeleton-based Action Recognition | ✓ Link | 97.0 | 99.6 | PoseC3D (RGB + Pose) | 2021-04-28 |
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living | ✓ Link | 96.3 | 99.0 | π-ViT (RGB + Pose) | 2023-11-30 |
A Unified Multimodal De- and Re-coupling Framework for RGB-D Motion Recognition | ✓ Link | 96.2 | 98.0 | UMDR (RGB-D) | 2022-11-16 |
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition | ✓ Link | 96.1 | 99.0 | EPAM-Net | 2024-08-10 |
MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos | ✓ Link | 96.0 | 98.8 | MMNet (RGB + Pose) | 2022-05-26 |
Hierarchical Action Classification with Network Pruning | | 95.66 | 98.79 | Hierarchical Action Classification (RGB + Pose) | 2020-07-30 |
VPN: Learning Video-Pose Embedding for Activities of Daily Living | ✓ Link | 95.5 | 98.0 | VPN (RGB + Pose) | 2020-07-06 |
Explore Human Parsing Modality for Action Recognition | ✓ Link | 94.7 | 97.7 | EPP-Net (Parsing + Pose) | 2024-01-04 |
Cross-Modal Learning with 3D Deformable Attention for Action Recognition | | 94.3 | 97.9 | 3DA (RGB + Pose) | 2022-12-12 |
Action Machine: Rethinking Action Recognition in Trimmed Videos | | 94.3 | 97.2 | Action Machine (RGB only) | 2018-12-14 |
Just Add $π$! Pose Induced Video Transformers for Understanding Activities of Daily Living | ✓ Link | 94.0 | 97.9 | π-ViT (RGB only) | 2023-11-30 |
Integrating Human Parsing and Pose Network for Human Action Recognition | ✓ Link | 93.8 | 97.1 | IPP-Net (Parsing + Pose) | 2023-07-16 |
Multi-View Action Recognition Using Contrastive Learning | ✓ Link | 93.7 | 98.9 | ViewCon (RGB + Pose) | 2023-01-03 |
DVANet: Disentangling View and Action Features for Multi-View Action Recognition | ✓ Link | 93.4 | 98.1 | DVANet (RGB only) | 2023-12-10 |
Joint-Partition Group Attention for skeleton-based action recognition | ✓ Link | 93.2 | 96.9 | JPFormer | 2024-07-30 |
DSTSA-GCN: Advancing Skeleton-Based Gesture Recognition with Semantic-Aware Spatio-Temporal Topology Modeling | ✓ Link | 92.78 | 97.03 | DSTSA-GCN | 2025-01-21 |
Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition | ✓ Link | 92.5 | 97.4 | TSMF (RGB + Pose) | 2021-05-18 |
MSAF: Multimodal Split Attention Fusion | ✓ Link | 92.24 | | MSAF (RGB+Pose) | 2020-12-13 |
STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition | | 92.0 | 96.5 | STAR-Transformer (RGB + Pose) | 2022-10-14 |
MMTM: Multimodal Transfer Module for CNN Fusion | ✓ Link | 91.99 | | MMTM (RGB+Pose) | 2019-11-20 |
Infrared and 3D skeleton feature fusion for RGB-D action recognition | ✓ Link | 91.8 | 94.9 | FUSION (IR+Pose) | 2020-02-28 |
Recognizing Human Actions as the Evolution of Pose Estimation Maps | | 91.7 | 95.2 | PoseMap (RGB+Pose) | 2018-06-01 |
B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition | ✓ Link | 91.7 | | B2C-AFM(RGB+Pose) | 2023-08-30 |
Part-based Graph Convolutional Network for Action Recognition | ✓ Link | 87.5 | 93.2 | PB-GCN (Skeleton only) | 2018-09-13 |
Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points | ✓ Link | 86.6 | 93.2 | Glimpse Clouds (RGB only) | 2018-02-22 |
SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition | ✓ Link | 76.5 | 84.7 | Skelemotion + Yang et al. (Skeleton only) | 2019-07-30 |
Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos | | 74.9 | | DSSCA-SSLM (RGB only) | 2016-03-23 |