self-supervised-action-recognition-on-hmdb51

Action RecognitionSelf-Supervised Action Recognition

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top-1 Accuracy	Pre-Training Dataset	Frozen	ModelName	ReleaseDate
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning	✓ Link	79.7	Kinetics400	false	MVD (ViT-B)	2022-12-08
Masked Motion Encoding for Self-Supervised Video Representation Learning	✓ Link	78.0	Kinetics400	false	M3Video	2022-10-12
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning	✓ Link	75.0	Kinetics400	false	pBYOL	2021-04-29
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning	✓ Link	74.7	Kinetics400	false	SCE (R3D-50)	2022-12-21
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training	✓ Link	73.3	Kinetics400	false	VideoMAE	2022-03-23
Broaden Your Views for Self-Supervised Video Learning	✓ Link	70.5		false	BraVe:V-FA (TSM-50x2)	2021-03-30
Spatiotemporal Contrastive Video Representation Learning	✓ Link	69.9	Kinetics600	false	CVRL (R3D-152 2x; K600)	2020-08-09
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning	✓ Link	69			XKD (ViT-B/112/16)	2022-11-25
Self-Supervised Learning by Cross-Modal Audio-Video Clustering	✓ Link	68.9	IG-Kinetics	false	XDC	2019-11-28
Spatiotemporal Contrastive Video Representation Learning	✓ Link	68.0	Kinetics600	false	CVRL (R3D-50; K600)	2020-08-09
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity	✓ Link	66.8	AudioSet	false	CrissCross (AudioSet)	2021-11-09
Spatiotemporal Contrastive Video Representation Learning	✓ Link	66.7	Kinetics400	false	CVRL (R3D-50; K400)	2020-08-09
Self-Supervised Learning by Cross-Modal Audio-Video Clustering	✓ Link	66.5	IG-Random	false	XDC	2019-11-28
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning	✓ Link	65.9			XKD-Modality-Agnostic (ViT-B/112/16)	2022-11-25
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens	✓ Link	65.8	no extra data	false	VideoMS (ViT-B)	2022-11-19
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	64.7	Audioset (Video+Audio)	false	AVID+CMA (Modified R2+1D-18 on Audioset)	2020-04-27
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning	✓ Link	64.7	Kinetics400	false	RSPNet	2020-10-27
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity	✓ Link	64.7	Kinetics400	false	CrissCross (Kinetics400)	2021-11-09
Evolving Losses for Unsupervised Video Representation Learning		64.5		false	ELo	2020-02-26
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	64.1	Audioset (Video+Audio)	false	AVID (Modified R2+1D-18 on Audioset)	2020-04-27
Self-Supervised Learning by Cross-Modal Audio-Video Clustering	✓ Link	63.7	AudioSet	false	XDC	2019-11-28
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training	✓ Link	62.6	no extra data	false	VideoMAE(no extra data)	2022-03-23
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	62.2	UCF101	false	ViCC (S3D; R+F)	2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	61.5	UCF101	false	ViCC (R2+1D; R+F)	2021-06-18
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	60.8	Kinetics400 (Video+Audio)	false	AVID+CMA (Modified R2+1D-18 on Kinetics)	2020-04-27
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity	✓ Link	60.5	Kinetics-Sound	false	CrissCross (Kinetics-Sound)	2021-11-09
Audio-Visual Instance Discrimination with Cross-Modal Agreement	✓ Link	59.9	Kinetics400 (Video+Audio)	false	AVID (Modified R2+1D-18 on Kinetics)	2020-04-27
Self-Supervised Video Representation Learning with Meta-Contrastive Network		54.8	UCF101	false	MCN (R3D-18; RGB)	2021-08-19
Self-Supervised Video Representation Learning with Meta-Contrastive Network		54.5	UCF101	false	MCN (R2+1D; RGB)	2021-08-19
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos	✓ Link	54.5	UCF101	false	SLIC (R3D-18)	2022-06-25
TCLR: Temporal Contrastive Learning for Video Representation	✓ Link	52.9	UCF101	false	TCLR (R3D-18)	2021-01-20
Self-Supervised Learning by Cross-Modal Audio-Video Clustering	✓ Link	52.6	Kinetics400	false	XDC	2019-11-28
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	52.4	UCF101	false	ViCC (R2+1D; RGB)	2021-06-18
Self-supervised Co-training for Video Representation Learning	✓ Link	46.1		false	CoCLR	2020-10-19
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning	✓ Link	43.2	UCF101	false	PCL (ResNet-18)	2020-10-29
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting	✓ Link	38.5	UCF101	true	ViCC (S3D; RGB)	2021-06-18
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework	✓ Link	38.3	UCF101	false	IIC (R3D)	2020-08-06
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning	✓ Link	36.6	Kinetics400	false	TCE (ResNet-50)	2020-03-21
Video Representation Learning by Dense Predictive Coding	✓ Link	35.7	Kinetics400	false	DPC (Modified 3D Resnet-34)	2019-09-10
Video Representation Learning by Dense Predictive Coding	✓ Link	34.5	Kinetics400	false	DPC (Modified 3D ResNet-18)	2019-09-10
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning	✓ Link	34.2	Kinetics400	false	TCE (ResNet-18)	2020-03-21
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles		33.7	Kinetics400	false	3D Cubic Puzzles (3D ResNet-18)	2018-11-24
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction		33.7	Kinetics400	false	3D RotNet (3D ResNet-18)	2018-11-28
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning	✓ Link	31.5	UCF101	false	VCP (R3D)	2020-01-02
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction		29.5	UCF101	false	Video Clip Ordering (R3D)	2019-06-01
Unsupervised Representation Learning by Sorting Sequences	✓ Link	23.8	UCF101	false	OPN (VGG-M-2048)	2017-08-03
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics	✓ Link	20.3	UCF101	false	Motion & Appearance (C3D)	2019-04-07
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification		19.8	UCF101	false	Shuffle and Learn (AlexNet)	2016-03-28

OpenCodePapers

self-supervised-action-recognition-on-hmdb51