action-classification-on-kinetics-700

VideoAction Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top-1 Accuracy	Top-5 Accuracy	ModelName	ReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	85.9		InternVideo2-6B	2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	85.4		InternVideo2-1B	2024-03-22
InternVideo: General Video Foundation Models via Generative and Discriminative Learning	✓ Link	84.0		InternVideo-T	2022-12-06
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning	✓ Link	83.8	96.6	TubeViT-L	2022-12-06
Unmasked Teacher: Towards Training-Efficient Video Foundation Models	✓ Link	83.6	96.7	UMT-L (ViT-L/16)	2023-03-28
Multiview Transformers for Video Recognition	✓ Link	83.4	96.2	MTV-H (WTS 60M)	2022-01-12
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	✓ Link	82.9%		EVA	2022-11-14
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer	✓ Link	82.7	96.2	UniFormerV2-L	2022-09-22
CoCa: Contrastive Captioners are Image-Text Foundation Models	✓ Link	82.7		CoCa (finetuned)	2022-05-04
CoCa: Contrastive Captioners are Image-Text Foundation Models	✓ Link	81.1		CoCa (frozen)	2022-05-04
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles	✓ Link	81.1		Hiera-H (no extra data)	2023-06-01
Masked Feature Prediction for Self-Supervised Visual Pre-Training	✓ Link	80.4	95.7	MaskFeat (no extra data, MViT-L)	2021-12-16
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video	✓ Link	80.4	94.9	mPLUG-2	2023-02-01
AIM: Adapting Image Models for Efficient Video Action Recognition	✓ Link	80.4		AIM (CLIP ViT-L/14, 32x224)	2023-02-06
Co-training Transformer with Videos and Images Improves Action Recognition		79.8	94.9	CoVeR (JFT-3B)	2021-12-14
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	79.4	94.9	MViTv2-L (ImageNet-21k pretrain)	2021-12-02
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	79.4		MoViNet-A6	2021-12-02
Co-training Transformer with Videos and Images Improves Action Recognition		78.5	94.2	CoVeR (JFT-300M)	2021-12-14
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	76.6	93.2	MViTv2-B	2021-12-02
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	72.3		MoViNet-A6	2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	71.7		MoViNet-A5	2021-03-21
VidTr: Video Transformer Without Convolutions		70.8	89.4	En-VidTr-L	2021-04-23
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	70.7		MoViNet-A4	2021-03-21
VidTr: Video Transformer Without Convolutions		70.2	89	VidTr-L	2021-04-23
VidTr: Video Transformer Without Convolutions		69.5	88.3	VidTr-M	2021-04-23
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	68.0		MoViNet-A3	2021-03-21
VidTr: Video Transformer Without Convolutions		67.3	87.7	VidTr-S	2021-04-23
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	66.7		MoViNet-A2	2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	63.5		MoViNet-A1	2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition	✓ Link	58.5		MoViNet-A0	2021-03-21
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	56.46	76.82	SRTG r3d-101	2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	54.17	74.62	SRTG r(2+1)d-50	2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	53.52	74.17	SRTG r3d-50	2020-06-15
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision	✓ Link	51.9		SEER (RegNet10B)	2022-02-16
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	49.43	73.23	SRTG r(2+1)d-34	2020-06-15
Learn to cycle: Time-consistent feature discovery for action recognition	✓ Link	49.15	72.68	SRTG r3d-34	2020-06-15

OpenCodePapers

action-classification-on-kinetics-700