OpenCodePapers

action-recognition-in-videos-on-hmdb-51

Action Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAverage accuracy of 3 splitsModelNameReleaseDate
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking✓ Link88.7VideoMAE V2-g2023-03-29
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification✓ Link88.6DejaVid2025-01-01
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors87.56DEEP-HAL with ODF+SDF(I3D)2020-01-14
High-order Tensor Pooling with Attention for Action Recognition87.21TO+MaxExp+IDT2021-10-11
Tensor Representations for Action Recognition✓ Link86.11SCK⊕(I3D)+IDT2020-12-28
High-order Tensor Pooling with Attention for Action Recognition85.70SO+MaxExp+IDT2021-10-11
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition✓ Link85.10R2+1D-BERT2020-08-03
Pose And Joint-Aware Action Recognition✓ Link84.53Ours + ResNext101 BERT2020-10-16
SMART Frame Selection for Action Recognition84.36SMART2020-12-19
Omni-sourced Webly-supervised Learning for Video Recognition✓ Link83.8OmniSource (SlowOnly-8x8-R101-RGB + I3D Flow)2020-03-29
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video✓ Link83.4ZeroI2V ViT-L/142023-10-02
PERF-Net: Pose Empowered RGB-Flow Net83.2PERF-Net (distilled S3D-G)2020-09-28
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models✓ Link83.1BIKE2022-12-31
Bubblenet: A Disperse Recurrent Structure To Recognize Activities82.60BubbleNET2020-10-30
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs82.48HAF+BoW/FV halluc2019-06-13
Cooperative Cross-Stream Network for Discriminative Action Representation81.9CCS + TSN (ImageNet+Kinetics pretrained)2019-08-27
Representation Flow for Action Recognition✓ Link81.1RepFlow-50 ([2+1]D CNN, FcF, Non-local block)2018-10-02
Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition80.92Multi-stream I3D 2019-03-20
MARS: Motion-Augmented RGB Stream for Action Recognition✓ Link80.9MARS+RGB+FLow (64 frames, Kinetics pretrained)2019-06-01
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link80.9Two-stream I3D2017-05-22
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link80.7Two-Stream I3D (Imagenet+Kinetics pre-training)2017-05-22
Learning Spatio-Temporal Representation with Local and Global Diffusion80.5LGD-3D Two-stream2019-06-13
D3D: Distilled 3D Networks for Video Action Recognition✓ Link80.5D3D + D3D2018-12-19
Asymmetric Masked Distillation for Pre-Training Small Foundation Models79.6AMD(ViT-B/16)2023-11-06
D3D: Distilled 3D Networks for Video Action Recognition✓ Link79.3D3D (Kinetics-600 pretraining)2018-12-19
Learning Spatio-Temporal Representation with Local and Global Diffusion78.9LGD-3D Flow2019-06-13
Hidden Two-Stream Convolutional Networks for Action Recognition✓ Link78.7Hidden Two-Stream2017-04-02
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link78.7R[2+1]D-TwoStream (Kinetics pretrained)2017-11-30
D3D: Distilled 3D Networks for Video Action Recognition✓ Link78.7D3D (Kinetics-400 pretraining)2018-12-19
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition77.8I3D RGB + DMC-Net (I3D)2019-01-11
Busy-Quiet Video Disentangling for Video Classification✓ Link77.6BQN2021-03-29
MotionSqueeze: Neural Motion Feature Learning for Video Understanding✓ Link77.4MSNet-R50 (16 frames, ImageNet pretrained)2020-07-20
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link77.3Flow-I3D (Kinetics pre-training)2017-05-22
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link77.1Flow-I3D (Imagenet+Kinetics pre-training)2017-05-22
Large Scale Holistic Video Understanding✓ Link76.5HATNet (32 frames)2019-04-25
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link76.4R[2+1]D-Flow (Kinetics pretrained)2017-11-30
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification✓ Link75.9S3D-G (ImageNet, Kinetics-400 pretrained)2017-12-13
FASTER Recurrent Networks for Efficient Video Classification75.7FASTER32 (Kinetics pretrain)2019-06-10
Learning Spatio-Temporal Representation with Local and Global Diffusion75.7LGD-3D RGB2019-06-13
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link74.8RGB-I3D (Imagenet+Kinetics pre-training)2017-05-22
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link74.5R[2+1]D-RGB (Kinetics pretrained)2017-11-30
VidTr: Video Transformer Without Convolutions74.4VidTr-L2021-04-23
Contrastive Video Representation Learning via Adversarial Perturbations74.3ADL+ResNet+IDT2018-07-24
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link74.3RGB-I3D (Kinetics pre-training)2017-05-22
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition✓ Link74.2Optical Flow Guided Feature2017-11-29
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link72.7R[2+1D]D-TwoStream (Sports1M pretrained)2017-11-30
End-to-End Learning of Motion Representation for Video Understanding✓ Link72.6TVNet+IDT2018-04-02
Spatiotemporal Multiplier Networks for Video Action Recognition✓ Link72.2STM Network+IDT2017-07-01
Attention Distillation for Learning Video Representations72.0Prob-Distill2019-04-05
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition71.8DMC-Net (I3D)2019-01-11
Learning spatio-temporal representations with temporal squeeze pooling71.5TesNet (ImageNet pretrained)2020-02-11
Hierarchical Feature Aggregation Networks for Video Action Recognition71.13HF-ECOLite (ImageNet+Kinetics pretrain)2019-05-29
Appearance-and-Relation Networks for Video Classification✓ Link70.9ARTNet w/ TSN2017-11-24
Spatiotemporal Residual Networks for Video Action Recognition✓ Link70.3ST-ResNet + IDT2016-11-07
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link70.1R[2+1]D-Flow (Sports1M pretrained)2017-11-30
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition✓ Link69.4Temporal Segment Networks2016-08-02
TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition✓ Link69TS-LSTM2017-03-30
Self-supervised Video Transformer✓ Link67.2SVT2021-12-02
A Closer Look at Spatiotemporal Convolutions for Action Recognition✓ Link66.6R[2+1]D-RGB (Sports1M pretrained)2017-11-30
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors✓ Link65.9TDD + IDT2015-05-19
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning✓ Link65.9VIMPAC2021-06-21
Convolutional Two-Stream Network Fusion for Video Action Recognition✓ Link65.4S:VGG-16, T:VGG-16 (ImageNet pretrained)2016-04-22
Dynamic Image Networks for Action Recognition✓ Link65.2Dynamic Image Networks + IDT2016-06-01
Long-term Temporal Convolutions for Action Recognition✓ Link64.8LTC2016-04-15
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition62.8R-STAN-502019-06-19
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition62.8DMC-Net (ResNet-18)2019-01-11
SUSiNet: See, Understand and Summarize it62.7SUSiNet (multi, Kinetics pretrained)2018-12-03
Two-Stream Convolutional Networks for Action Recognition in Videos✓ Link59.4Two-Stream (ImageNet pretrained)2014-06-09
ActionFlowNet: Learning Motion Representation for Action Recognition56.4ActionFlowNet2016-12-09
R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition55.16R-STAN-1522019-06-19
ConvNet Architecture Search for Spatiotemporal Feature Learning✓ Link54.9Res3D2017-08-16
DistInit: Learning Video Representations Without a Single Labeled Video54.8R(2+1)D-18 (DistInit pretraining)2019-01-26
Pose And Joint-Aware Action Recognition✓ Link54.2JRMN2020-10-16
Towards Universal Representation for Unseen Action Recognition51.8CD-UAR2018-03-22
Learning Spatiotemporal Features with 3D Convolutional Networks✓ Link51.6C3D2014-12-02
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples✓ Link49.2R[2+1]D (VideoMoCo)2021-03-10
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples✓ Link43.63D-ResNet-18 (VideoMoCo)2021-03-10