self-supervised-action-recognition-on-ucf101

Action RecognitionSelf-Supervised Action Recognition

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	3-fold Accuracy	Pre-Training Dataset	Frozen	split-1 Top-1 Accuracy	ModelName	ReleaseDate
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking	✓ Link	99.6				VideoMAE V2-g	2023-03-29
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning	✓ Link	97.5	Kinetics400	false		MVD (ViT-B)	2022-12-08
A Large-Scale Analysis on Self-Supervised Video Representation Learning		97.3	Kinetics400	false		SSL-KD (R21D-18)	2023-06-09
Masked Motion Encoding for Self-Supervised Video Representation Learning	✓ Link	96.5	Kinetics400	false		M3Video	2022-10-12
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning	✓ Link	96.3	Kinetics400	false		pBYOL	2021-04-29
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training	✓ Link	96.1	Kinetics400	false		VideoMAE	2022-03-23
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning	✓ Link	95.3	Kinetics400	false		SCE (R3D-50)	2022-12-21
Self-Supervised MultiModal Versatile Networks	✓ Link	95.2	Audioset + Howto100M	false		MMV TSM-50x2	2020-06-29
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning	✓ Link	94.1	Kinetics400			XKD (ViT-B/112/16)	2022-11-25
Spatiotemporal Contrastive Video Representation Learning	✓ Link	93.9	Kinetics600	false		CVRL (R3D-152 2x; K600)	2020-08-09
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning	✓ Link	93.7	Kinetics400	false		RSPNet	2020-10-27
Spatiotemporal Contrastive Video Representation Learning	✓ Link	93.4	Kinetics600	false		CVRL (R3D-50; K600)	2020-08-09
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens	✓ Link	93.4	no extra data	false		VideoMS (ViT-B)	2022-11-19
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning	✓ Link	93.4				XKD-Modality-Agnostic (ViT-B/112/16)	2022-11-25
Broaden Your Views for Self-Supervised Video Learning	✓ Link	93.1		false		BraVe:V-FA (TSM-50x2)	2021-03-30
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity	✓ Link	92.4	AudioSet	false		CrissCross (AudioSet)	2021-11-09
Spatiotemporal Contrastive Video Representation Learning	✓ Link	92.2	Kinetics400	false		CVRL (R3D-50; K400)	2020-08-09
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	91.5	Audioset (Audio+Video)	false		AVID+CMA (Modified R2+1D-18 on Audioset)	2020-04-27
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity	✓ Link	91.5	Kinetics400	false		CrissCross (Kinetics400)	2021-11-09
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training	✓ Link	91.3	no extra data	false		VideoMAE(no extra data)	2022-03-23
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	91.0	Audioset (Audio+Video)	false		AVID (Modified R2+1D-18 on Audioset)	2020-04-27
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	90.5	UCF101	false		ViCC (S3D; R+F)	2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	88.8	UCF101	false		ViCC (S3D; RGB)	2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	88.8	UCF101	false		ViCC (R2+1D; R+F)	2021-06-18
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity	✓ Link	88.3	Kinetics-Sound	false		CrissCross (Kinetics-Sound)	2021-11-09
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	87.5	Kinetics400 (Audio+Video)	false		AVID+CMA (Modified R2+1D-18 on Kinetics)	2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	86.9	Kinetics400 (Audio+Video)	false		AVID (Modified R2+1D-18 on Kinetics)	2020-04-27
Self-Supervised Video Representation Learning with Meta-Contrastive Network		85.4				MCN (R3D-18; RGB)	2021-08-19
Self-Supervised Video Representation Learning with Meta-Contrastive Network		84.8				MCN (R2+1D; RGB)	2021-08-19
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	82.8	UCF101	false		ViCC (R2+1D; RGB)	2021-06-18
TCLR: Temporal Contrastive Learning for Video Representation	✓ Link	82.4	UCF101	false		TCLR (R3D-18)	2021-01-20
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning	✓ Link	82.3	UCF101	false		PCL (ResNet-18)	2020-10-29
Video Representation Learning by Dense Predictive Coding	✓ Link	75.7	Kinetics400	false		DPC (Modified 3D Resnet-34)	2019-09-10
Self-supervised Co-training for Video Representation Learning	✓ Link	74.5		false		CoCLR	2020-10-19
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework	✓ Link	74.4	UCF101	false		IIC (R3D)	2020-08-06
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	72.2	UCF101	true		ViCC (S3D; RGB)	2021-06-18
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning	✓ Link	71.2	Kinetics400	false		TCE (ResNet-50)	2020-03-21
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning	✓ Link	68.8	Kinetics400	false		TCE (ResNet-18, Split 1)	2020-03-21
Video Representation Learning by Dense Predictive Coding	✓ Link	68.2	Kinetics400	false		DPC (3D ResNet-18)	2019-09-10
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning	✓ Link	68.2	UCF101	false		TCE (ResNet18, Split 1)	2020-03-21
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning	✓ Link	66	UCF101	false		VCP (R3D)	2020-01-02
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles		65.8	Kinetics400	false		3D Cubic Puzzles (3D ResNet-18)	2018-11-24
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction		64.9	UCF101	false		Video Clip Ordering (R3D)	2019-06-01
Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking		64.4	UCF101	false		Skip-Clip (3D ResNet-18)	2019-10-28
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction		62.9	Kinetics400	false		3D RotNet (3D ResNet-18)	2018-11-28
Video Representation Learning by Dense Predictive Coding	✓ Link	60.6	UCF101	false		DPC (3D ResNet-18, Split 1)	2019-09-10
Self-Supervised Video Representation Learning With Odd-One-Out Networks		60.3	UCF101	false		O3N (AlexNet)	2016-11-21
Contrastive Multiview Coding	✓ Link	59.1	UCF101	false		Contrastive Multiview Coding (CaffeNet x2)	2019-06-13
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics	✓ Link	58.8	UCF101	false		Motion & Appearance (C3D)	2019-04-07
Learning and Using the Arrow of Time		55.3	UCF101	false		Arrow of Time (AlexNet)	2018-06-01
Generating Videos with Scene Dynamics		52.1	UCF101	false		VideoGan (C3D)	2016-09-08
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification		50.9	UCF101	false		Shuffle and Learn (AlexNet)	2016-03-28
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos	✓ Link		UCF101	false	83.2	SLIC (R3D-18)	2022-06-25

OpenCodePapers

self-supervised-action-recognition-on-ucf101