action-recognition-in-videos-on-ucf101

Action RecognitionAction Recognition In Videos

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	3-fold Accuracy	Accuracy	Accuracy 20%Test	ModelName	ReleaseDate
Enhancing Video Transformers for Action Understanding with VLM-aided Training		99.7			FTP-UniFormerV2-L/14	2024-03-24
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking	✓ Link	99.6			VideoMAE V2-g	2023-03-29
OmniVec: Learning robust representations with cross modal sharing		99.6			OmniVec	2023-11-07
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning		99.6			OmniVec2	2024-01-01
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models	✓ Link	98.8			BIKE	2022-12-31
SMART Frame Selection for Action Recognition		98.64			SMART	2020-12-19
Omni-sourced Webly-supervised Learning for Video Recognition	✓ Link	98.6			OmniSource (SlowOnly-8x8-R101-RGB + I3D-Flow)	2020-03-29
PERF-Net: Pose Empowered RGB-Flow Net		98.6			PERF-Net (multi-distilled S3D)	2020-09-28
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video	✓ Link	98.6			ZeroI2V ViT-L/14	2023-10-02
Learning Spatio-Temporal Representation with Local and Global Diffusion		98.2			LGD-3D Two-stream	2019-06-13
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition	✓ Link	98.2			Text4Vis	2022-07-04
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	98.0			Two-Stream I3D (Imagenet+Kinetics pre-training)	2017-05-22
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	97.8			Two-Stream I3D (Kinetics pre-training)	2017-05-22
Large Scale Holistic Video Understanding	✓ Link	97.8			HATNet (32 frames)	2019-04-25
MARS: Motion-Augmented RGB Stream for Action Recognition	✓ Link	97.8			MARS+RGB+Flow (64 frames, Kinetics pretrained)	2019-06-01
Bubblenet: A Disperse Recurrent Structure To Recognize Activities		97.62			BubbleNET	2020-10-30
D3D: Distilled 3D Networks for Video Action Recognition	✓ Link	97.6			D3D + D3D	2018-12-19
Busy-Quiet Video Disentangling for Video Classification	✓ Link	97.6			BQN	2021-03-29
Cooperative Cross-Stream Network for Discriminative Action Representation		97.4			CCS + TSN (ImageNet+Kinetics pretrained)	2019-08-27
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	97.3			R[2+1]D-TwoStream (Kinetics pretrained)	2017-11-30
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition		97.2			Multi-stream I3D	2019-03-20
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition		97.2			CA2ST(B/16)	2025-03-30
Hidden Two-Stream Convolutional Networks for Action Recognition	✓ Link	97.1			Hidden Two-Stream	2017-04-02
D3D: Distilled 3D Networks for Video Action Recognition	✓ Link	97.1			D3D (Kinetics-600 pretraining)	2018-12-19
Asymmetric Masked Distillation for Pre-Training Small Foundation Models		97.1			AMD(ViT-B/16)	2023-11-06
D3D: Distilled 3D Networks for Video Action Recognition	✓ Link	97			D3D (Kinetics-400 pretraining)	2018-12-19
Learning Spatio-Temporal Representation with Local and Global Diffusion		97			LGD-3D RGB	2019-06-13
An Image is Worth 16x16 Words, What is a Video Worth?	✓ Link	97			STAM-32 (ImageNet/Kinetics pretraining)	2021-03-25
FASTER Recurrent Networks for Efficient Video Classification		96.9			FASTER32	2019-06-10
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	96.8			R[2+1]D-RGB (Kinetics pretrained)	2017-11-30
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification	✓ Link	96.8			S3D-G (ImageNet, Kinetics-400 pretrained)	2017-12-13
Learning Spatio-Temporal Representation with Local and Global Diffusion		96.8			LGD-3D Flow	2019-06-13
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	96.7			Flow-I3D (Imagenet+Kinetics pre-training)	2017-05-22
VidTr: Video Transformer Without Convolutions		96.7			VidTr-L	2021-04-23
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	96.5			Flow-I3D (Kinetics pre-training)	2017-05-22
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition		96.5			I3D RGB + DMC-Net (I3D)	2019-01-11
Two-Stream Video Classification with Cross-Modality Attention		96.5			CMA iter1-S	2019-08-01
$A^2$-Nets: Double Attention Networks		96.4			A2-Net (ResNet-50)	2018-10-27
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition	✓ Link	96			Optical Flow Guided Feature	2017-11-29
Multi-Fiber Networks for Video Recognition		96.0			MF-Net, RGB only (ImageNet+Kinetics pretrained)	2018-07-30
MARS: Motion-Augmented RGB Stream for Action Recognition	✓ Link	95.8			MARS+RGB+Flow (16 frames)	2019-06-01
Attention Distillation for Learning Video Representations		95.7			Prob-Distill	2019-04-05
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	95.6			RGB-I3D (Imagenet+Kinetics pre-training)	2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	95.5			R[2+1]D-Flow (Kinetics pretrained)	2017-11-30
End-to-End Learning of Motion Representation for Video Understanding	✓ Link	95.4			TVNet+IDT	2018-04-02
Learning spatio-temporal representations with temporal squeeze pooling		95.2			TesNet (ImageNet pretrained)	2020-02-11
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	95.1			RGB-I3D (Kinetics pre-training)	2017-05-22
I3D-LSTM: A New Model for Human Action Recognition	✓ Link	95.1			I3D-LSTM	2019-08-09
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	95			R[2+1]D-TwoStream (Sports-1M pretrained)	2017-11-30
LIGAR: Lightweight General-purpose Action Recognition	✓ Link	94.85			X3D MobileNet-V3 LGD-GC	2021-08-30
Spatiotemporal Residual Networks for Video Action Recognition	✓ Link	94.6			ST-ResNet + IDT	2016-11-07
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?	✓ Link	94.5			ResNeXt-101 (64f)	2017-11-27
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition		94.5			R-STAN-101	2019-06-19
Appearance-and-Relation Networks for Video Classification	✓ Link	94.3			ARTNet w/ TSN	2017-11-24
Temporal-Spatial Mapping for Action Recognition		94.3			TSN+TSM	2018-09-11
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition	✓ Link	94.2			Temporal Segment Networks	2016-08-02
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition	✓ Link	94.1			TS-LSTM	2017-03-30
Self-supervised Video Transformer	✓ Link	93.7			SVT	2021-12-02
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	93.6			R[2+1]D-RGB (Sports-1M pretrained)	2017-11-30
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	93.4			Two-stream I3D	2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	93.3			R[2+1]D-Flow (Sports-1M pretrained)	2017-11-30
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning	✓ Link	92.7			VIMPAC	2021-06-21
Convolutional Two-Stream Network Fusion for Video Action Recognition	✓ Link	92.5			S:VGG-16, T:VGG-16 (ImageNet pretrain)	2016-04-22
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition		92.3			DMC-Net (I3D)	2019-01-11
Dance with Flow: Two-in-One Stream Action Detection	✓ Link	92			two-in-one two stream	2019-04-01
Long-term Temporal Convolutions for Action Recognition	✓ Link	91.7			LTC	2016-04-15
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors	✓ Link	91.5			TDD + IDT	2015-05-19
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition		91.5			R-STAN-50	2019-06-19
Towards Good Practices for Very Deep Two-Stream ConvNets	✓ Link	91.4			Very deep two-stream ConvNet	2015-07-08
Efficient Action Recognition Using Confidence Distillation		91.2			3D ResNeXt-101 + Confidence Distillation	2021-09-05
Multi-region two-stream R-CNN for action detection		91.1			MR Two-Sream R-CNN	2016-09-17
Dynamic Image Networks for Action Recognition	✓ Link	89.1			Dynamic Image Networks + IDT	2016-06-01
Beyond Short Snippets: Deep Networks for Video Classification	✓ Link	88.6			Two-stream+LSTM	2015-03-31
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks	✓ Link	88.6			P3D (ImageNet + Sports1M)	2017-11-28
Two-Stream Convolutional Networks for Action Recognition in Videos	✓ Link	88.0			Two-Stream (ImageNet pretrained)	2014-06-09
Real-time Action Recognition with Enhanced Motion Vector CNNs	✓ Link	86.4			MV-CNN	2016-04-26
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer	✓ Link	86.1			Dynamics 2 for DenseNet-201 Transformer	2023-02-17
ConvNet Architecture Search for Spatiotemporal Feature Learning	✓ Link	85.8			Res3D	2017-08-16
DistInit: Learning Video Representations Without a Single Labeled Video		85.8			R(2+1)D-18 (DistInit pretraining)	2019-01-26
ActionFlowNet: Learning Motion Representation for Action Recognition		83.9			ActionFlowNet	2016-12-09
Learning Spatiotemporal Features with 3D Convolutional Networks	✓ Link	82.3			C3D	2014-12-02
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN	✓ Link	79.83			HalluciNet (ResNet-50)	2019-12-10
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples	✓ Link	78.7			R[2+1]D (VideoMoCo)	2021-03-10
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples	✓ Link	74.1			3D-ResNet-18 (VideoMoCo)	2021-03-10
Large-Scale Video Classification with Convolutional Neural Networks	✓ Link	65.4			Slow Fusion + Finetune top 3 layers	2014-06-23
MLGCN: Multi-Laplacian Graph Convolutional Networks for Human Action Recognition		63.27			MLGCN	2019-09-11
Towards Universal Representation for Unseen Action Recognition		42.5			CD-UAR	2018-03-22
[]()		35.2			SL
PoTion: Pose MoTion Representation for Action Recognition		29.3			I3D + PoTion	2018-06-01
Federated Self-supervised Learning for Video Understanding	✓ Link		73.16		R3D-18	2022-07-05
Adaptive frame selection in two dimensional convolutional neural network action recognition	✓ Link			98.05	ResNet50	2022-12-28

OpenCodePapers

action-recognition-in-videos-on-ucf101