action-recognition-in-videos-on-hmdb-51

Action RecognitionAction Recognition In Videos

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Average accuracy of 3 splits	ModelName	ReleaseDate
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking	✓ Link	88.7	VideoMAE V2-g	2023-03-29
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification	✓ Link	88.6	DejaVid	2025-01-01
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors		87.56	DEEP-HAL with ODF+SDF(I3D)	2020-01-14
High-order Tensor Pooling with Attention for Action Recognition		87.21	TO+MaxExp+IDT	2021-10-11
Tensor Representations for Action Recognition	✓ Link	86.11	SCK⊕(I3D)+IDT	2020-12-28
High-order Tensor Pooling with Attention for Action Recognition		85.70	SO+MaxExp+IDT	2021-10-11
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition	✓ Link	85.10	R2+1D-BERT	2020-08-03
Pose And Joint-Aware Action Recognition	✓ Link	84.53	Ours + ResNext101 BERT	2020-10-16
SMART Frame Selection for Action Recognition		84.36	SMART	2020-12-19
Omni-sourced Webly-supervised Learning for Video Recognition	✓ Link	83.8	OmniSource (SlowOnly-8x8-R101-RGB + I3D Flow)	2020-03-29
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video	✓ Link	83.4	ZeroI2V ViT-L/14	2023-10-02
PERF-Net: Pose Empowered RGB-Flow Net		83.2	PERF-Net (distilled S3D-G)	2020-09-28
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models	✓ Link	83.1	BIKE	2022-12-31
Bubblenet: A Disperse Recurrent Structure To Recognize Activities		82.60	BubbleNET	2020-10-30
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs		82.48	HAF+BoW/FV halluc	2019-06-13
Cooperative Cross-Stream Network for Discriminative Action Representation		81.9	CCS + TSN (ImageNet+Kinetics pretrained)	2019-08-27
Representation Flow for Action Recognition	✓ Link	81.1	RepFlow-50 ([2+1]D CNN, FcF, Non-local block)	2018-10-02
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition		80.92	Multi-stream I3D	2019-03-20
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	80.9	Two-stream I3D	2017-05-22
MARS: Motion-Augmented RGB Stream for Action Recognition	✓ Link	80.9	MARS+RGB+FLow (64 frames, Kinetics pretrained)	2019-06-01
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	80.7	Two-Stream I3D (Imagenet+Kinetics pre-training)	2017-05-22
D3D: Distilled 3D Networks for Video Action Recognition	✓ Link	80.5	D3D + D3D	2018-12-19
Learning Spatio-Temporal Representation with Local and Global Diffusion		80.5	LGD-3D Two-stream	2019-06-13
Asymmetric Masked Distillation for Pre-Training Small Foundation Models		79.6	AMD(ViT-B/16)	2023-11-06
D3D: Distilled 3D Networks for Video Action Recognition	✓ Link	79.3	D3D (Kinetics-600 pretraining)	2018-12-19
Learning Spatio-Temporal Representation with Local and Global Diffusion		78.9	LGD-3D Flow	2019-06-13
Hidden Two-Stream Convolutional Networks for Action Recognition	✓ Link	78.7	Hidden Two-Stream	2017-04-02
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	78.7	R[2+1]D-TwoStream (Kinetics pretrained)	2017-11-30
D3D: Distilled 3D Networks for Video Action Recognition	✓ Link	78.7	D3D (Kinetics-400 pretraining)	2018-12-19
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition		77.8	I3D RGB + DMC-Net (I3D)	2019-01-11
Busy-Quiet Video Disentangling for Video Classification	✓ Link	77.6	BQN	2021-03-29
MotionSqueeze: Neural Motion Feature Learning for Video Understanding	✓ Link	77.4	MSNet-R50 (16 frames, ImageNet pretrained)	2020-07-20
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	77.3	Flow-I3D (Kinetics pre-training)	2017-05-22
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	77.1	Flow-I3D (Imagenet+Kinetics pre-training)	2017-05-22
Large Scale Holistic Video Understanding	✓ Link	76.5	HATNet (32 frames)	2019-04-25
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	76.4	R[2+1]D-Flow (Kinetics pretrained)	2017-11-30
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification	✓ Link	75.9	S3D-G (ImageNet, Kinetics-400 pretrained)	2017-12-13
FASTER Recurrent Networks for Efficient Video Classification		75.7	FASTER32 (Kinetics pretrain)	2019-06-10
Learning Spatio-Temporal Representation with Local and Global Diffusion		75.7	LGD-3D RGB	2019-06-13
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	74.8	RGB-I3D (Imagenet+Kinetics pre-training)	2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	74.5	R[2+1]D-RGB (Kinetics pretrained)	2017-11-30
VidTr: Video Transformer Without Convolutions		74.4	VidTr-L	2021-04-23
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	74.3	RGB-I3D (Kinetics pre-training)	2017-05-22
Contrastive Video Representation Learning via Adversarial Perturbations		74.3	ADL+ResNet+IDT	2018-07-24
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition	✓ Link	74.2	Optical Flow Guided Feature	2017-11-29
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	72.7	R[2+1D]D-TwoStream (Sports1M pretrained)	2017-11-30
End-to-End Learning of Motion Representation for Video Understanding	✓ Link	72.6	TVNet+IDT	2018-04-02
Spatiotemporal Multiplier Networks for Video Action Recognition	✓ Link	72.2	STM Network+IDT	2017-07-01
Attention Distillation for Learning Video Representations		72.0	Prob-Distill	2019-04-05
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition		71.8	DMC-Net (I3D)	2019-01-11
Learning spatio-temporal representations with temporal squeeze pooling		71.5	TesNet (ImageNet pretrained)	2020-02-11
Hierarchical Feature Aggregation Networks for Video Action Recognition		71.13	HF-ECOLite (ImageNet+Kinetics pretrain)	2019-05-29
Appearance-and-Relation Networks for Video Classification	✓ Link	70.9	ARTNet w/ TSN	2017-11-24
Spatiotemporal Residual Networks for Video Action Recognition	✓ Link	70.3	ST-ResNet + IDT	2016-11-07
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	70.1	R[2+1]D-Flow (Sports1M pretrained)	2017-11-30
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition	✓ Link	69.4	Temporal Segment Networks	2016-08-02
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition	✓ Link	69	TS-LSTM	2017-03-30
Self-supervised Video Transformer	✓ Link	67.2	SVT	2021-12-02
A Closer Look at Spatiotemporal Convolutions for Action Recognition	✓ Link	66.6	R[2+1]D-RGB (Sports1M pretrained)	2017-11-30
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors	✓ Link	65.9	TDD + IDT	2015-05-19
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning	✓ Link	65.9	VIMPAC	2021-06-21
Convolutional Two-Stream Network Fusion for Video Action Recognition	✓ Link	65.4	S:VGG-16, T:VGG-16 (ImageNet pretrained)	2016-04-22
Dynamic Image Networks for Action Recognition	✓ Link	65.2	Dynamic Image Networks + IDT	2016-06-01
Long-term Temporal Convolutions for Action Recognition	✓ Link	64.8	LTC	2016-04-15
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition		62.8	DMC-Net (ResNet-18)	2019-01-11
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition		62.8	R-STAN-50	2019-06-19
SUSiNet: See, Understand and Summarize it		62.7	SUSiNet (multi, Kinetics pretrained)	2018-12-03
Two-Stream Convolutional Networks for Action Recognition in Videos	✓ Link	59.4	Two-Stream (ImageNet pretrained)	2014-06-09
ActionFlowNet: Learning Motion Representation for Action Recognition		56.4	ActionFlowNet	2016-12-09
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition		55.16	R-STAN-152	2019-06-19
ConvNet Architecture Search for Spatiotemporal Feature Learning	✓ Link	54.9	Res3D	2017-08-16
DistInit: Learning Video Representations Without a Single Labeled Video		54.8	R(2+1)D-18 (DistInit pretraining)	2019-01-26
Pose And Joint-Aware Action Recognition	✓ Link	54.2	JRMN	2020-10-16
Towards Universal Representation for Unseen Action Recognition		51.8	CD-UAR	2018-03-22
Learning Spatiotemporal Features with 3D Convolutional Networks	✓ Link	51.6	C3D	2014-12-02
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples	✓ Link	49.2	R[2+1]D (VideoMoCo)	2021-03-10
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples	✓ Link	43.6	3D-ResNet-18 (VideoMoCo)	2021-03-10

OpenCodePapers

action-recognition-in-videos-on-hmdb-51