OpenCodePapers

self-supervised-action-recognition-on-hmdb51

Action RecognitionSelf-Supervised Action Recognition
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop-1 AccuracyPre-Training DatasetFrozenModelNameReleaseDate
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning✓ Link79.7Kinetics400falseMVD (ViT-B)2022-12-08
Masked Motion Encoding for Self-Supervised Video Representation Learning✓ Link78.0Kinetics400falseM3Video2022-10-12
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning✓ Link75.0Kinetics400falsepBYOL2021-04-29
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning✓ Link74.7Kinetics400falseSCE (R3D-50)2022-12-21
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training✓ Link73.3Kinetics400falseVideoMAE2022-03-23
Broaden Your Views for Self-Supervised Video Learning✓ Link70.5falseBraVe:V-FA (TSM-50x2)2021-03-30
Spatiotemporal Contrastive Video Representation Learning✓ Link69.9Kinetics600falseCVRL (R3D-152 2x; K600)2020-08-09
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning✓ Link69XKD (ViT-B/112/16)2022-11-25
Self-Supervised Learning by Cross-Modal Audio-Video Clustering✓ Link68.9IG-KineticsfalseXDC2019-11-28
Spatiotemporal Contrastive Video Representation Learning✓ Link68.0Kinetics600falseCVRL (R3D-50; K600)2020-08-09
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity✓ Link66.8AudioSetfalseCrissCross (AudioSet)2021-11-09
Spatiotemporal Contrastive Video Representation Learning✓ Link66.7Kinetics400falseCVRL (R3D-50; K400)2020-08-09
Self-Supervised Learning by Cross-Modal Audio-Video Clustering✓ Link66.5IG-RandomfalseXDC2019-11-28
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning✓ Link65.9XKD-Modality-Agnostic (ViT-B/112/16)2022-11-25
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens✓ Link65.8no extra datafalseVideoMS (ViT-B)2022-11-19
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link64.7Audioset (Video+Audio)falseAVID+CMA (Modified R2+1D-18 on Audioset)2020-04-27
RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning✓ Link64.7Kinetics400falseRSPNet2020-10-27
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity✓ Link64.7Kinetics400falseCrissCross (Kinetics400)2021-11-09
Evolving Losses for Unsupervised Video Representation Learning64.5falseELo2020-02-26
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link64.1Audioset (Video+Audio)falseAVID (Modified R2+1D-18 on Audioset)2020-04-27
Self-Supervised Learning by Cross-Modal Audio-Video Clustering✓ Link63.7AudioSetfalseXDC2019-11-28
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training✓ Link62.6no extra datafalseVideoMAE(no extra data)2022-03-23
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link62.2UCF101falseViCC (S3D; R+F)2021-06-18
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link61.5UCF101falseViCC (R2+1D; R+F)2021-06-18
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link60.8Kinetics400 (Video+Audio)falseAVID+CMA (Modified R2+1D-18 on Kinetics)2020-04-27
Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity✓ Link60.5Kinetics-SoundfalseCrissCross (Kinetics-Sound)2021-11-09
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link59.9Kinetics400 (Video+Audio)falseAVID (Modified R2+1D-18 on Kinetics)2020-04-27
Self-Supervised Video Representation Learning with Meta-Contrastive Network54.8UCF101falseMCN (R3D-18; RGB)2021-08-19
Self-Supervised Video Representation Learning with Meta-Contrastive Network54.5UCF101falseMCN (R2+1D; RGB)2021-08-19
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos✓ Link54.5UCF101falseSLIC (R3D-18)2022-06-25
TCLR: Temporal Contrastive Learning for Video Representation✓ Link52.9UCF101falseTCLR (R3D-18)2021-01-20
Self-Supervised Learning by Cross-Modal Audio-Video Clustering✓ Link52.6Kinetics400falseXDC2019-11-28
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link52.4UCF101falseViCC (R2+1D; RGB)2021-06-18
Self-supervised Co-training for Video Representation Learning✓ Link46.1falseCoCLR2020-10-19
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning✓ Link43.2UCF101falsePCL (ResNet-18)2020-10-29
Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting✓ Link38.5UCF101trueViCC (S3D; RGB)2021-06-18
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework✓ Link38.3UCF101falseIIC (R3D)2020-08-06
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning✓ Link36.6Kinetics400falseTCE (ResNet-50)2020-03-21
Video Representation Learning by Dense Predictive Coding✓ Link35.7Kinetics400falseDPC (Modified 3D Resnet-34)2019-09-10
Video Representation Learning by Dense Predictive Coding✓ Link34.5Kinetics400falseDPC (Modified 3D ResNet-18)2019-09-10
Temporally Coherent Embeddings for Self-Supervised Video Representation Learning✓ Link34.2Kinetics400falseTCE (ResNet-18)2020-03-21
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction33.7Kinetics400false3D RotNet (3D ResNet-18)2018-11-28
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles33.7Kinetics400false3D Cubic Puzzles (3D ResNet-18)2018-11-24
Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning✓ Link31.5UCF101falseVCP (R3D)2020-01-02
Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction29.5UCF101falseVideo Clip Ordering (R3D)2019-06-01
Unsupervised Representation Learning by Sorting Sequences✓ Link23.8UCF101falseOPN (VGG-M-2048)2017-08-03
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics✓ Link20.3UCF101falseMotion & Appearance (C3D)2019-04-07
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification19.8UCF101falseShuffle and Learn (AlexNet)2016-03-28