OpenCodePapers

action-recognition-on-diving-48

Action Recognition
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
Extending Video Masked Autoencoders to 128 frames94.9LVMAE2024-11-20
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition✓ Link90.8Video-FocalNet-B2023-07-13
AIM: Adapting Image Models for Efficient Video Action Recognition✓ Link90.6AIM (CLIP ViT-L/14, 32x224)2023-02-06
Dual-path Adaptation from Image to Video Transformers✓ Link88.7DUALPATH2023-03-17
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning88.3TFCNet2022-03-11
Learning Correlation Structures for Vision Transformers88.3StructVit-B-4-12024-04-05
Object-Region Video Transformers✓ Link88.0ORViT TimeSformer2021-10-13
Group Contextualization for Video Recognition✓ Link87.6GC-TDN2022-03-18
BEVT: BERT Pretraining of Video Transformers✓ Link86.7BEVT2021-12-02
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition✓ Link86PSB2022-07-27
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning✓ Link85.5VIMPAC2021-06-21
Relational Self-Attention: What's Missing in Attention for Video Understanding✓ Link84.2RSANet-R50 (16 frames, ImageNet pretrained, a single clip)2021-11-02
Temporal Query Networks for Fine-grained Video Understanding81.8TQN2021-04-19
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition✓ Link81.3PMI Sampler2023-04-14
Is Space-Time Attention All You Need for Video Understanding?✓ Link81TimeSformer-L2021-02-09
Is Space-Time Attention All You Need for Video Understanding?✓ Link78TimeSformer-HR2021-02-09
SlowFast Networks for Video Recognition✓ Link77.6SlowFast2018-12-10
Is Space-Time Attention All You Need for Video Understanding?✓ Link75TimeSformer2021-02-09