OpenCodePapers
action-recognition-on-diving-48
Action Recognition
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Accuracy
↕
ModelName
ReleaseDate
↕
Extending Video Masked Autoencoders to 128 frames
94.9
LVMAE
2024-11-20
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
✓ Link
90.8
Video-FocalNet-B
2023-07-13
AIM: Adapting Image Models for Efficient Video Action Recognition
✓ Link
90.6
AIM (CLIP ViT-L/14, 32x224)
2023-02-06
Dual-path Adaptation from Image to Video Transformers
✓ Link
88.7
DUALPATH
2023-03-17
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
88.3
TFCNet
2022-03-11
Learning Correlation Structures for Vision Transformers
88.3
StructVit-B-4-1
2024-04-05
Object-Region Video Transformers
✓ Link
88.0
ORViT TimeSformer
2021-10-13
Group Contextualization for Video Recognition
✓ Link
87.6
GC-TDN
2022-03-18
BEVT: BERT Pretraining of Video Transformers
✓ Link
86.7
BEVT
2021-12-02
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
✓ Link
86
PSB
2022-07-27
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
✓ Link
85.5
VIMPAC
2021-06-21
Relational Self-Attention: What's Missing in Attention for Video Understanding
✓ Link
84.2
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
2021-11-02
Temporal Query Networks for Fine-grained Video Understanding
81.8
TQN
2021-04-19
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
✓ Link
81.3
PMI Sampler
2023-04-14
Is Space-Time Attention All You Need for Video Understanding?
✓ Link
81
TimeSformer-L
2021-02-09
Is Space-Time Attention All You Need for Video Understanding?
✓ Link
78
TimeSformer-HR
2021-02-09
SlowFast Networks for Video Recognition
✓ Link
77.6
SlowFast
2018-12-10
Is Space-Time Attention All You Need for Video Understanding?
✓ Link
75
TimeSformer
2021-02-09