OpenCodePapers

action-classification-on-moments-in-time

VideoAction Classification
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop 1 AccuracyTop 5 AccuracyModelNameReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning53.1OmniVec22024-01-01
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link50.9InternVideo2-1B2024-03-22
Unmasked Teacher: Towards Training-Efficient Video Foundation Models✓ Link48.778.2UMT-L (ViT-L/16)2023-03-28
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer✓ Link47.876.9UniFormerV2-L2022-09-22
Multiview Transformers for Video Recognition✓ Link47.275.7MTV-H (WTS 60M)2022-01-12
Co-training Transformer with Videos and Images Improves Action Recognition46.175.4CoVeR(JFT-3B)2021-12-14
Co-training Transformer with Videos and Images Improves Action Recognition45.073.9CoVeR(JFT-300M)2021-12-14
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text✓ Link41.167.7VATT-Large2021-04-22
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link40.2MoViNet-A62021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link39.1MoViNet-A52021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link37.9MoViNet-A42021-03-21
Video Transformer Network✓ Link37.465.4VTN2021-02-01
Attention Bottlenecks for Multimodal Fusion✓ Link37.361.2MBT (AV)2021-06-30
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link35.6MoViNet-A32021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link34.3MoViNet-A22021-03-21
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures✓ Link34.27%62.71%AssembleNet2019-05-30
Learn to cycle: Time-consistent feature discovery for action recognition✓ Link33.5658.49SRTG r3d-1012020-06-15
Collaborative Spatiotemporal Feature Learning for Video Action Recognition✓ Link32.4%60.0%CoST (ResNet-101, 32 frames)2019-06-01
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link32.0MoViNet-A12021-03-21
Evolving Space-Time Neural Architectures for Videos31.8%EvaNet2018-11-26
Learn to cycle: Time-consistent feature discovery for action recognition✓ Link31.6056.80SRTG r(2+1)d-502020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition✓ Link30.7255.65SRTG r3d-502020-06-15
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset✓ Link29.51%56.06%I3D2017-05-22
Learn to cycle: Time-consistent feature discovery for action recognition✓ Link28.9754.18SRTG r(2+1)d-342020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition✓ Link28.5552.35SRTG r3d-342020-06-15
Temporal Relational Reasoning in Videos✓ Link28.2753.87TRN-Multiscale2017-11-22
MoViNets: Mobile Video Networks for Efficient Video Recognition✓ Link27.5MoViNet-A02021-03-21
ViViT: A Video Vision Transformer✓ Link64.9ViViT-L/16x22021-03-29
Temporal Segment Networks for Action Recognition in Videos✓ Link50.10%TSN-2Stream2017-05-08