action-classification-on-moments-in-time

VideoAction Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top 1 Accuracy	Top 5 Accuracy	ModelName	ReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning		53.1		OmniVec2	2024-01-01
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	50.9		InternVideo2-1B	2024-03-22
Unmasked Teacher: Towards Training-Efficient Video Foundation Models	✓ Link	48.7	78.2	UMT-L (ViT-L/16)	2023-03-28
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer	✓ Link	47.8	76.9	UniFormerV2-L	2022-09-22
Multiview Transformers for Video Recognition	✓ Link	47.2	75.7	MTV-H (WTS 60M)	2022-01-12
Co-training Transformer with Videos and Images Improves Action Recognition		46.1	75.4	CoVeR(JFT-3B)	2021-12-14
Co-training Transformer with Videos and Images Improves Action Recognition		45.0	73.9	CoVeR(JFT-300M)	2021-12-14
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text	✓ Link	41.1	67.7	VATT-Large	2021-04-22
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	40.2		MoViNet-A6	2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	39.1		MoViNet-A5	2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	37.9		MoViNet-A4	2021-03-21
Video Transformer Network	✓ Link	37.4	65.4	VTN	2021-02-01
Attention Bottlenecks for Multimodal Fusion	✓ Link	37.3	61.2	MBT (AV)	2021-06-30
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	35.6		MoViNet-A3	2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	34.3		MoViNet-A2	2021-03-21
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures	✓ Link	34.27%	62.71%	AssembleNet	2019-05-30
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	33.56	58.49	SRTG r3d-101	2020-06-15
Collaborative Spatiotemporal Feature Learning for Video Action Recognition	✓ Link	32.4%	60.0%	CoST (ResNet-101, 32 frames)	2019-06-01
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	32.0		MoViNet-A1	2021-03-21
Evolving Space-Time Neural Architectures for Videos		31.8%		EvaNet	2018-11-26
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	31.60	56.80	SRTG r(2+1)d-50	2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	30.72	55.65	SRTG r3d-50	2020-06-15
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset	✓ Link	29.51%	56.06%	I3D	2017-05-22
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	28.97	54.18	SRTG r(2+1)d-34	2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	28.55	52.35	SRTG r3d-34	2020-06-15
Temporal Relational Reasoning in Videos	✓ Link	28.27	53.87	TRN-Multiscale	2017-11-22
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	27.5		MoViNet-A0	2021-03-21
ViViT: A Video Vision Transformer	✓ Link		64.9	ViViT-L/16x2	2021-03-29
Temporal Segment Networks for Action Recognition in Videos	✓ Link		50.10%	TSN-2Stream	2017-05-08

OpenCodePapers

action-classification-on-moments-in-time