OpenCodePapers

action-classification-on-charades

VideoAction Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeMAPFLOPs (G) x viewsModelNameReleaseDate
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?✓ Link66.3TokenLearner2021-06-21
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning✓ Link66.2TubeViT-L2022-12-06
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link63.2MoViNet-A62021-03-21
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors62.29DEEP-HAL with ODF+SDF (AssembleNet++)2020-01-14
AssembleNet++: Assembling Modality Representations via Attention Connections✓ Link59.8AssembleNet++ 502020-08-18
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures✓ Link58.6AssembleNet2019-05-30
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures✓ Link58.6AssembleNet-1012019-05-30
VicTR: Video-conditioned Text Representations for Activity Recognition57.6VicTR (ViT-L/14)2023-04-05
AssembleNet++: Assembling Modality Representations via Attention Connections✓ Link54.98AssembleNet++ 50 without object2020-08-18
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models✓ Link50.7BIKE2022-12-31
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors50.16DEEP-HAL with ODF+SDF (I3D)2020-01-14
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link48.5MoViNet-A42021-03-21
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition47.8AdaFocus (weak supervision, MViT-B-24, 32x3)2023-11-28
Multiscale Vision Transformers✓ Link47.7MViT-B-24, 32x3 (Kinetics-600 pretraining)2021-04-22
VidTr: Video Transformer Without Convolutions47.3En-VidTr-L2021-04-23
Multiscale Vision Transformers✓ Link47.1MViT-B, 32x3 (Kinetics-600 pretraining)2021-04-22
Multiscale Vision Transformers✓ Link46.3MViT-B-24, 32x3 (Kinetics-400 pretraining)2021-04-22
SlowFast Networks for Video Recognition✓ Link45.2SlowFast (Kinetics-600 pretraining, NL)2018-12-10
Multiscale Vision Transformers✓ Link44.3MViT-B, 32x3 (Kinetics-400 pretraining)2021-04-22
ActionCLIP: A New Paradigm for Video Action Recognition✓ Link44.3ActionCLIP (ViT-B/16)2021-09-17
Multiscale Vision Transformers✓ Link43.9MViT-B, 16x4 (Kinetics-600 pretraining)2021-04-22
VidTr: Video Transformer Without Convolutions43.5VidTr-L2021-04-23
Pose And Joint-Aware Action Recognition✓ Link43.23JMRN + R101-NL-LFB2020-10-16
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs43.1HAF+BoW/FV/OFF halluc. +MSK×8/PN2019-06-13
Long-Term Feature Banks for Detailed Video Understanding✓ Link42.5LFB2018-12-12
SlowFast Networks for Video Recognition✓ Link42.5SlowFast (Kinetics-400 pretraining, NL)2018-12-10
SlowFast Networks for Video Recognition✓ Link42.1SlowFast (Kinetics-600 pretraining)2018-12-10
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition41.4AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)2023-11-28
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition41.2AdaFocus (weak supervision, X3D-L, 32x3)2023-11-28
Timeception for Complex Action Recognition✓ Link41.1Timeception (R3D)2018-12-04
PA3D: Pose-Action 3D Machine for Video Recognition41PA3D + (GCN + I3D + NL I3D)2019-06-01
PoTion: Pose MoTion Representation for Action Recognition40.8PoTion + (GCN + I3D + NL I3D)2018-06-01
Multiscale Vision Transformers✓ Link40MViT-B, 16x4 (Kinetics-400 pretraining)2021-04-22
Videos as Space-Time Region Graphs39.7STRG2018-06-05
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition39.3AdaFocus (weak supervision, Slowfast-R50, 16x8)2023-11-28
Revisiting spatio-temporal layouts for compositional action recognition✓ Link38.5STLT + I3D2021-11-02
Evolving Space-Time Neural Architectures for Videos38.1EvaNet2018-11-26
Timeception for Complex Action Recognition✓ Link37.2Timeception (I3D)2018-12-04
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link32.9I3D2017-05-22
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link32.5MoViNet-A22021-03-21
Timeception for Complex Action Recognition✓ Link31.6Timeception (R2D)2018-12-04
Temporal Relational Reasoning in Videos✓ Link25.2MultiScale TRN2017-11-22
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos✓ Link25.26.9x1Co Slow_642021-05-31
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos✓ Link24.154.9x1Slow-8×82021-05-31
Asynchronous Temporal Fields for Action Recognition✓ Link22.4Asyn-TF2016-12-19
Compressed Video Action Recognition✓ Link21.9CoViAR2017-12-02
Continual 3D Convolutional Neural Networks for Real-time Processing of Videos✓ Link21.56.9x1Co Slow_82021-05-31
Two-Stream Convolutional Networks for Action Recognition in Videos✓ Link18.62-Strm2014-06-09
Pose And Joint-Aware Action Recognition✓ Link16.2JMRN (Pose only)2020-10-16