End-to-End Spatio-Temporal Action Localisation with Video Transformers | | 90.3 | | 88.0 | 71.8 | STAR/L | 2023-04-24 |
Scaling Open-Vocabulary Action Detection | ✓ Link | 88.5 | | | | SiA | 2025-04-04 |
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization | ✓ Link | 87.3 | 86.1 | 78.6 | 53.1 | YOWO + LFB | 2019-11-15 |
Holistic Interaction Transformer Network for Action Detection | ✓ Link | 84.8 | | 88.8 | 74.3 | HIT | 2022-10-23 |
You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization | ✓ Link | 80.4 | 82.5 | 75.8 | 48.8 | YOWO | 2019-11-15 |
Actions as Moving Points | ✓ Link | 77.8 | | 81.8 | 53.9 | MOC | 2020-01-14 |
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions | ✓ Link | 76.3 | | | 59.9 | Faster-RCNN + two-stream I3D conv | 2017-05-23 |
STEP: Spatio-Temporal Progressive Learning for Video Action Detection | ✓ Link | 75 | 83.1 | 76.6 | | STEP | 2019-04-19 |
Stable Mean Teacher for Semi-supervised Video Action Detection | ✓ Link | 73.9 | | | 76.3 | Stable Mean Teacher (I3D) | 2024-12-10 |
Hierarchical Self-Attention Network for Action Localization in Videos | | 73.71 | | 80.42 | 49.50 | HISAN (VGG-16) | 2019-10-01 |
TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection | | 72.1 | | 77.5 | 52.9 | TACNet | 2019-05-31 |
End-to-End Semi-Supervised Learning for Video Action Detection | ✓ Link | 69.9 | | | 72.1 | E2E-SSL (I3D) | 2022-03-08 |
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos | ✓ Link | 41.37 | 51.3 | 47.1 | | T-CNN | 2017-03-30 |
Multi-region two-stream R-CNN for action detection | | 39.94 | | | | TS R-CNN | 2016-09-17 |
Multi-region two-stream R-CNN for action detection | | 39.63 | | | | MR-TS R-CNN | 2016-09-17 |
Hierarchical Self-Attention Network for Action Localization in Videos | | | | 82.30 | 51.47 | HISAN (ResNet-101 + FPN) | 2019-10-01 |
Dance with Flow: Two-in-One Stream Action Detection | ✓ Link | | | 78.48 | 50.30 | Two-in-one Two Stream | 2019-04-01 |
Dance with Flow: Two-in-One Stream Action Detection | ✓ Link | | | 75.48 | 48.31 | Two-in-one | 2019-04-01 |
Finding Action Tubes with a Sparse-to-Dense Framework | | | | | 54 | DTS | 2020-08-30 |