End-to-End Spatio-Temporal Action Localisation with Video Transformers | | 41.7 | | | STAR/L | 2023-04-24 |
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization | ✓ Link | 30.0 | | | ACAR-Net, SlowFast R-101 (Kinetics-400 pretraining) | 2020-06-14 |
Pose And Joint-Aware Action Recognition | ✓ Link | 28.4 | | | JMRN + SlowFast-R101-NL | 2020-10-16 |
SlowFast Networks for Video Recognition | ✓ Link | 28.3 | | | SlowFast++ (Kinetics-600 pretraining, NL) | 2018-12-10 |
Long-Term Feature Banks for Detailed Video Understanding | ✓ Link | 27.7 | | | LFB (Kinetics-400 pretraining) | 2018-12-12 |
Video Action Transformer Network | | 27.6 | 39.6 | 19.3 | I3D Tx HighRes | 2018-12-06 |
SlowFast Networks for Video Recognition | ✓ Link | 27.3 | | | SlowFast (Kinetics-600 pretraining, NL) | 2018-12-10 |
SlowFast Networks for Video Recognition | ✓ Link | 26.8 | | | SlowFast (Kinetics-600 pretraining) | 2018-12-10 |
SlowFast Networks for Video Recognition | ✓ Link | 26.3 | | | SlowFast (Kinetics-400 pretraining) | 2018-12-10 |
Video Action Transformer Network | | 23.4 | 6.5 | 16.2 | I3D I3D | 2018-12-06 |
D3D: Distilled 3D Networks for Video Action Recognition | ✓ Link | 23 | | | D3D (ResNet RPN, Kinetics-400 pretraining) | 2018-12-19 |
A Better Baseline for AVA | | 22.8 | | | I3D w/ RPN + JFT (Kinetics-400 pretraining( | 2018-07-26 |
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions | ✓ Link | 22.0 | | | S3D-G w/ ResNet RPN (Kinetics-400 pretraining( | 2017-05-23 |
A Better Baseline for AVA | | 21.9 | | | I3D w/ RPN (Kinetics-400 pretraining( | 2018-07-26 |
Actor-Centric Relation Network | ✓ Link | 17.4 | | | ARCN | 2018-07-28 |