OpenCodePapers

action-recognition-on-diving-48

Action Recognition

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
Extending Video Masked Autoencoders to 128 frames		94.9	LVMAE	2024-11-20
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition	✓ Link	90.8	Video-FocalNet-B	2023-07-13
AIM: Adapting Image Models for Efficient Video Action Recognition	✓ Link	90.6	AIM (CLIP ViT-L/14, 32x224)	2023-02-06
Dual-path Adaptation from Image to Video Transformers	✓ Link	88.7	DUALPATH	2023-03-17
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning		88.3	TFCNet	2022-03-11
Learning Correlation Structures for Vision Transformers		88.3	StructVit-B-4-1	2024-04-05
Object-Region Video Transformers	✓ Link	88.0	ORViT TimeSformer	2021-10-13
Group Contextualization for Video Recognition	✓ Link	87.6	GC-TDN	2022-03-18
BEVT: BERT Pretraining of Video Transformers	✓ Link	86.7	BEVT	2021-12-02
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition	✓ Link	86	PSB	2022-07-27
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning	✓ Link	85.5	VIMPAC	2021-06-21
Relational Self-Attention: What's Missing in Attention for Video Understanding	✓ Link	84.2	RSANet-R50 (16 frames, ImageNet pretrained, a single clip)	2021-11-02
Temporal Query Networks for Fine-grained Video Understanding		81.8	TQN	2021-04-19
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition	✓ Link	81.3	PMI Sampler	2023-04-14
Is Space-Time Attention All You Need for Video Understanding?	✓ Link	81	TimeSformer-L	2021-02-09
Is Space-Time Attention All You Need for Video Understanding?	✓ Link	78	TimeSformer-HR	2021-02-09
SlowFast Networks for Video Recognition	✓ Link	77.6	SlowFast	2018-12-10
Is Space-Time Attention All You Need for Video Understanding?	✓ Link	75	TimeSformer	2021-02-09