OpenCodePapers

self-supervised-action-recognition-on-ucf101

Action RecognitionSelf-Supervised Action Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCode3-fold AccuracyPre-Training DatasetFrozensplit-1 Top-1 AccuracyModelNameReleaseDate
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking✓ Link99.6VideoMAE V2-g2023-03-29
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning✓ Link97.5Kinetics400falseMVD (ViT-B)2022-12-08
A Large-Scale Analysis on Self-Supervised Video Representation Learning97.3Kinetics400falseSSL-KD (R21D-18)2023-06-09
Masked Motion Encoding for Self-Supervised Video Representation Learning✓ Link96.5Kinetics400falseM3Video2022-10-12
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning✓ Link96.3Kinetics400falsepBYOL2021-04-29
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training✓ Link96.1Kinetics400falseVideoMAE2022-03-23
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning✓ Link95.3Kinetics400falseSCE (R3D-50)2022-12-21
Self-Supervised MultiModal Versatile Networks✓ Link95.2Audioset + Howto100MfalseMMV TSM-50x22020-06-29
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning✓ Link94.1Kinetics400XKD (ViT-B/112/16)2022-11-25
Spatiotemporal Contrastive Video Representation Learning✓ Link93.9Kinetics600falseCVRL (R3D-152 2x; K600)2020-08-09
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning✓ Link93.7Kinetics400falseRSPNet2020-10-27
Spatiotemporal Contrastive Video Representation Learning✓ Link93.4Kinetics600falseCVRL (R3D-50; K600)2020-08-09
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens✓ Link93.4no extra datafalseVideoMS (ViT-B)2022-11-19
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning✓ Link93.4XKD-Modality-Agnostic (ViT-B/112/16)2022-11-25
Broaden Your Views for Self-Supervised Video Learning✓ Link93.1falseBraVe:V-FA (TSM-50x2)2021-03-30
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity✓ Link92.4AudioSetfalseCrissCross (AudioSet)2021-11-09
Spatiotemporal Contrastive Video Representation Learning✓ Link92.2Kinetics400falseCVRL (R3D-50; K400)2020-08-09
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link91.5Audioset (Audio+Video)falseAVID+CMA (Modified R2+1D-18 on Audioset)2020-04-27
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity✓ Link91.5Kinetics400falseCrissCross (Kinetics400)2021-11-09
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training✓ Link91.3no extra datafalseVideoMAE(no extra data)2022-03-23
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link91.0Audioset (Audio+Video)falseAVID (Modified R2+1D-18 on Audioset)2020-04-27
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link90.5UCF101falseViCC (S3D; R+F)2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link88.8UCF101falseViCC (S3D; RGB)2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link88.8UCF101falseViCC (R2+1D; R+F)2021-06-18
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity✓ Link88.3Kinetics-SoundfalseCrissCross (Kinetics-Sound)2021-11-09
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link87.5Kinetics400 (Audio+Video)falseAVID+CMA (Modified R2+1D-18 on Kinetics)2020-04-27
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link86.9Kinetics400 (Audio+Video)falseAVID (Modified R2+1D-18 on Kinetics)2020-04-27
Self-Supervised Video Representation Learning with Meta-Contrastive Network85.4MCN (R3D-18; RGB)2021-08-19
Self-Supervised Video Representation Learning with Meta-Contrastive Network84.8MCN (R2+1D; RGB)2021-08-19
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link82.8UCF101falseViCC (R2+1D; RGB)2021-06-18
TCLR: Temporal Contrastive Learning for Video Representation✓ Link82.4UCF101falseTCLR (R3D-18)2021-01-20
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning✓ Link82.3UCF101falsePCL (ResNet-18)2020-10-29
Video Representation Learning by Dense Predictive Coding✓ Link75.7Kinetics400falseDPC (Modified 3D Resnet-34)2019-09-10
Self-supervised Co-training for Video Representation Learning✓ Link74.5falseCoCLR2020-10-19
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework✓ Link74.4UCF101falseIIC (R3D)2020-08-06
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link72.2UCF101trueViCC (S3D; RGB)2021-06-18
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning✓ Link71.2Kinetics400falseTCE (ResNet-50)2020-03-21
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning✓ Link68.8Kinetics400falseTCE (ResNet-18, Split 1)2020-03-21
Video Representation Learning by Dense Predictive Coding✓ Link68.2Kinetics400falseDPC (3D ResNet-18)2019-09-10
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning✓ Link68.2UCF101falseTCE (ResNet18, Split 1)2020-03-21
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning✓ Link66UCF101falseVCP (R3D)2020-01-02
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles65.8Kinetics400false3D Cubic Puzzles (3D ResNet-18)2018-11-24
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction64.9UCF101falseVideo Clip Ordering (R3D)2019-06-01
Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking64.4UCF101falseSkip-Clip (3D ResNet-18)2019-10-28
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction62.9Kinetics400false3D RotNet (3D ResNet-18)2018-11-28
Video Representation Learning by Dense Predictive Coding✓ Link60.6UCF101falseDPC (3D ResNet-18, Split 1)2019-09-10
Self-Supervised Video Representation Learning With Odd-One-Out Networks60.3UCF101falseO3N (AlexNet)2016-11-21
Contrastive Multiview Coding✓ Link59.1UCF101falseContrastive Multiview Coding (CaffeNet x2)2019-06-13
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics✓ Link58.8UCF101falseMotion & Appearance (C3D)2019-04-07
Learning and Using the Arrow of Time55.3UCF101falseArrow of Time (AlexNet)2018-06-01
Generating Videos with Scene Dynamics52.1UCF101falseVideoGan (C3D)2016-09-08
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification50.9UCF101falseShuffle and Learn (AlexNet)2016-03-28
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos✓ LinkUCF101false83.2SLIC (R3D-18)2022-06-25