OpenCodePapers

action-recognition-on-epic-kitchens-100

Action Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAction@1Verb@1Noun@1GFLOPsModelNameReleaseDate
LLaVAction: evaluating and training multi-modal large language models for action recognition✓ Link58.37669LLaVAction2025-03-24
TIM: A Time Interval Machine for Audio-Visual Action Recognition✓ Link56.476.266.4TIM2024-04-08
Training a Large Video Model on a Single Machine in a Day✓ Link54.473.065.4Avion (ViT-L)2023-09-28
M&M Mix: A Multimodal Multiview Transformer Ensemble53.672.066.3M&M (WTS 60M)2022-06-20
Extending Video Masked Autoencoders to 128 frames52.175.061.8LVMAE2024-11-20
Temporally-Adaptive Models for Efficient Video Understanding✓ Link51.871.764.1TAdaFormer-L/142023-08-10
Learning Video Representations from Large Language Models✓ Link517262.9LaViLa (TimeSformer-L)2022-12-08
Multiview Transformers for Video Recognition✓ Link50.569.963.9MTV-B (WTS 60M)2022-01-12
Omnivore: A Single Model for Many Visual Modalities✓ Link49.969.561.7OMNIVORE (Swin-B, finetuned)2022-01-20
CAST: Cross-Attention in Space and Time for Video Action Recognition✓ Link49.372.560.9CAST(ViT-B/16)2023-11-30
Temporally-Adaptive Models for Efficient Video Understanding✓ Link48.971.060.2TAdaConvNeXtV2-S2023-08-10
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition✓ Link48.471.460.3MeMViT-242022-01-20
Multiscale Multimodal Transformer for Multimodal Action Recognition47.870.161.0MMT2022-09-22
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link47.772.257.3117x1MoViNet-A62021-03-21
AVT: Audio-Video Transformer for Multimodal Action Recognition47.270.459.3AVT2022-09-22
Object-Region Video Transformers✓ Link45.768.458.7ORViT Mformer-L (ORViT blocks)2021-10-13
Technical Report: Temporal Aggregate Representations✓ Link45.266653.35TempAgg2021-06-06
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link44.569.155.174.9x1MoViNet-A52021-03-21
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers✓ Link44.567.058.5Mformer-HR2021-06-09
Gate-Shift-Fuse for Video Action Recognition✓ Link44.4869.0653.18GSF2022-03-16
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link44.468.856.242.2x1MoViNet-A42021-03-21
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers✓ Link44.167.157.6Mformer-L2021-06-09
ViViT: A Video Vision Transformer✓ Link44.066.456.8ViViT-L/16x2 Fact. encoder2021-03-29
Attention Bottlenecks for Multimodal Fusion✓ Link43.464.858MBT2021-06-30
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers✓ Link43.166.756.5Mformer2021-06-09
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link41.267.152.37.59x1MoViNet-A22021-03-21
Rescaling Egocentric Vision✓ Link37.39TSM2020-06-23
Rescaling Egocentric Vision✓ Link36.81SlowFast2020-06-23
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link36.864.847.41.74x1MoViNet-A02021-03-21
Rescaling Egocentric Vision✓ Link35.55TBN2020-06-23
Rescaling Egocentric Vision✓ Link35.28TRN2020-06-23
Rescaling Egocentric Vision✓ Link33.57TSN2020-06-23