OpenCodePapers

action-recognition-in-videos-on-ucf101

Action Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCode3-fold AccuracyAccuracyAccuracy 20%TestModelNameReleaseDate
Enhancing Video Transformers for Action Understanding with VLM-aided Training99.7FTP-UniFormerV2-L/142024-03-24
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking✓ Link99.6VideoMAE V2-g2023-03-29
OmniVec: Learning robust representations with cross modal sharing99.6OmniVec2023-11-07
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning99.6OmniVec22024-01-01
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models✓ Link98.8BIKE2022-12-31
SMART Frame Selection for Action Recognition98.64SMART2020-12-19
Omni-sourced Webly-supervised Learning for Video Recognition✓ Link98.6OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)2020-03-29
PERF-Net: Pose Empowered RGB-Flow Net98.6PERF-Net (multi-distilled S3D)2020-09-28
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video✓ Link98.6ZeroI2V ViT-L/142023-10-02
Learning Spatio-Temporal Representation with Local and Global Diffusion98.2LGD-3D Two-stream2019-06-13
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition✓ Link98.2Text4Vis2022-07-04
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link98.0Two-Stream I3D (Imagenet+Kinetics pre-training)2017-05-22
MARS: Motion-Augmented RGB Stream for Action Recognition✓ Link97.8MARS+RGB+Flow (64 frames, Kinetics pretrained)2019-06-01
Large Scale Holistic Video Understanding✓ Link97.8HATNet (32 frames)2019-04-25
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link97.8Two-Stream I3D (Kinetics pre-training)2017-05-22
Bubblenet: A Disperse Recurrent Structure To Recognize Activities97.62BubbleNET2020-10-30
D3D: Distilled 3D Networks for Video Action Recognition✓ Link97.6D3D + D3D2018-12-19
Busy-Quiet Video Disentangling for Video Classification✓ Link97.6BQN2021-03-29
Cooperative Cross-Stream Network for Discriminative Action Representation97.4CCS + TSN (ImageNet+Kinetics pretrained)2019-08-27
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link97.3R[2+1]D-TwoStream (Kinetics pretrained)2017-11-30
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition97.2Multi-stream I3D 2019-03-20
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition97.2CA2ST(B/16)2025-03-30
Hidden Two-Stream Convolutional Networks for Action Recognition✓ Link97.1Hidden Two-Stream2017-04-02
D3D: Distilled 3D Networks for Video Action Recognition✓ Link97.1D3D (Kinetics-600 pretraining)2018-12-19
Asymmetric Masked Distillation for Pre-Training Small Foundation Models97.1AMD(ViT-B/16)2023-11-06
D3D: Distilled 3D Networks for Video Action Recognition✓ Link97D3D (Kinetics-400 pretraining)2018-12-19
Learning Spatio-Temporal Representation with Local and Global Diffusion97LGD-3D RGB2019-06-13
An Image is Worth 16x16 Words, What is a Video Worth?✓ Link97STAM-32 (ImageNet/Kinetics pretraining)2021-03-25
FASTER Recurrent Networks for Efficient Video Classification96.9FASTER322019-06-10
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link96.8R[2+1]D-RGB (Kinetics pretrained)2017-11-30
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification✓ Link96.8S3D-G (ImageNet, Kinetics-400 pretrained)2017-12-13
Learning Spatio-Temporal Representation with Local and Global Diffusion96.8LGD-3D Flow2019-06-13
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link96.7Flow-I3D (Imagenet+Kinetics pre-training)2017-05-22
VidTr: Video Transformer Without Convolutions96.7VidTr-L2021-04-23
Two-Stream Video Classification with Cross-Modality Attention96.5CMA iter1-S2019-08-01
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link96.5Flow-I3D (Kinetics pre-training)2017-05-22
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition96.5I3D RGB + DMC-Net (I3D)2019-01-11
$A^2$-Nets: Double Attention Networks96.4A2-Net (ResNet-50)2018-10-27
Multi-Fiber Networks for Video Recognition96.0MF-Net, RGB only (ImageNet+Kinetics pretrained)2018-07-30
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition✓ Link96Optical Flow Guided Feature2017-11-29
MARS: Motion-Augmented RGB Stream for Action Recognition✓ Link95.8MARS+RGB+Flow (16 frames)2019-06-01
Attention Distillation for Learning Video Representations95.7Prob-Distill2019-04-05
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link95.6RGB-I3D (Imagenet+Kinetics pre-training)2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link95.5R[2+1]D-Flow (Kinetics pretrained)2017-11-30
End-to-End Learning of Motion Representation for Video Understanding✓ Link95.4TVNet+IDT2018-04-02
Learning spatio-temporal representations with temporal squeeze pooling95.2TesNet (ImageNet pretrained)2020-02-11
I3D-LSTM: A New Model for Human Action Recognition✓ Link95.1I3D-LSTM2019-08-09
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link95.1RGB-I3D (Kinetics pre-training)2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link95R[2+1]D-TwoStream (Sports-1M pretrained)2017-11-30
LIGAR: Lightweight General-purpose Action Recognition✓ Link94.85X3D MobileNet-V3 LGD-GC2021-08-30
Spatiotemporal Residual Networks for Video Action Recognition✓ Link94.6ST-ResNet + IDT2016-11-07
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?✓ Link94.5ResNeXt-101 (64f)2017-11-27
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition94.5R-STAN-1012019-06-19
Temporal-Spatial Mapping for Action Recognition94.3TSN+TSM2018-09-11
Appearance-and-Relation Networks for Video Classification✓ Link94.3ARTNet w/ TSN2017-11-24
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition✓ Link94.2Temporal Segment Networks2016-08-02
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition✓ Link94.1TS-LSTM2017-03-30
Self-supervised Video Transformer✓ Link93.7SVT2021-12-02
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link93.6R[2+1]D-RGB (Sports-1M pretrained)2017-11-30
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link93.4Two-stream I3D2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link93.3R[2+1]D-Flow (Sports-1M pretrained)2017-11-30
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning✓ Link92.7VIMPAC2021-06-21
Convolutional Two-Stream Network Fusion for Video Action Recognition✓ Link92.5S:VGG-16, T:VGG-16 (ImageNet pretrain)2016-04-22
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition92.3DMC-Net (I3D)2019-01-11
Dance with Flow: Two-in-One Stream Action Detection✓ Link92two-in-one two stream2019-04-01
Long-term Temporal Convolutions for Action Recognition✓ Link91.7LTC2016-04-15
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition91.5R-STAN-502019-06-19
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors✓ Link91.5TDD + IDT2015-05-19
Towards Good Practices for Very Deep Two-Stream ConvNets✓ Link91.4Very deep two-stream ConvNet2015-07-08
Efficient Action Recognition Using Confidence Distillation91.23D ResNeXt-101 + Confidence Distillation2021-09-05
Multi-region two-stream R-CNN for action detection91.1MR Two-Sream R-CNN2016-09-17
Dynamic Image Networks for Action Recognition✓ Link89.1Dynamic Image Networks + IDT2016-06-01
Beyond Short Snippets: Deep Networks for Video Classification✓ Link88.6Two-stream+LSTM2015-03-31
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks✓ Link88.6P3D (ImageNet + Sports1M)2017-11-28
Two-Stream Convolutional Networks for Action Recognition in Videos✓ Link88.0Two-Stream (ImageNet pretrained)2014-06-09
Real-time Action Recognition with Enhanced Motion Vector CNNs✓ Link86.4MV-CNN2016-04-26
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer✓ Link86.1Dynamics 2 for DenseNet-201 Transformer2023-02-17
DistInit: Learning Video Representations Without a Single Labeled Video85.8R(2+1)D-18 (DistInit pretraining)2019-01-26
ConvNet Architecture Search for Spatiotemporal Feature Learning✓ Link85.8Res3D2017-08-16
ActionFlowNet: Learning Motion Representation for Action Recognition83.9ActionFlowNet2016-12-09
Learning Spatiotemporal Features with 3D Convolutional Networks✓ Link82.3C3D2014-12-02
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN✓ Link79.83HalluciNet (ResNet-50)2019-12-10
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples✓ Link78.7R[2+1]D (VideoMoCo)2021-03-10
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples✓ Link74.13D-ResNet-18 (VideoMoCo)2021-03-10
Large-Scale Video Classification with Convolutional Neural Networks✓ Link65.4Slow Fusion + Finetune top 3 layers2014-06-23
MLGCN: Multi-Laplacian Graph Convolutional Networks for Human Action Recognition63.27MLGCN2019-09-11
Towards Universal Representation for Unseen Action Recognition42.5CD-UAR2018-03-22
[]()35.2SL
PoTion: Pose MoTion Representation for Action Recognition29.3I3D + PoTion2018-06-01
Federated Self-supervised Learning for Video Understanding✓ Link73.16R3D-182022-07-05
Adaptive frame selection in two dimensional convolutional neural network action recognition✓ Link98.05ResNet502022-12-28