knowledge-distillation-on-imagenet

Knowledge Distillation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top-1 accuracy %	model size	CRD training setting		ModelName	ReleaseDate
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	✓ Link	86.43	87M	✘		ScaleKD (T:BEiT-L S:ViT-B/14)	2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	✓ Link	85.53	87M	✘		ScaleKD (T:Swin-L S:ViT-B/16)	2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	✓ Link	83.93	22M	✘		ScaleKD (T:Swin-L S:ViT-S/16)	2024-11-11
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	✓ Link	83.8	27M	✘		ScaleKD (T:Swin-L S:Swin-T)	2024-11-11
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	83.60	87M	✘		KD++(T: regnety-16GF S:ViT-B)	2023-05-26
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections	✓ Link	82.9	22M	✘		VkD (T:RegNety 160 S:DeiT-S)	2024-03-10
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis	✓ Link	82.7	22M	✘		SpectralKD (T:Swin-S S:Swin-T)	2024-12-26
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers	✓ Link	82.55	22M	✘		ScaleKD (T:Swin-L S:ResNet-50)	2024-11-11
Knowledge Diffusion for Distillation	✓ Link	82.5				DiffKD (T:Swin-L S: Swin-T)	2023-05-25
Knowledge Distillation from A Stronger Teacher	✓ Link	82.3	29M	✘		DIST (T: Swin-L S: Swin-T)	2022-05-21
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis	✓ Link	82.2	22M	✘		SpectralKD (T:Cait-S24 S:DeiT-S)	2024-12-26
Understanding the Role of the Projector in Knowledge Distillation	✓ Link	82.1	22M	✘		SRD (T:RegNety 160 S:DeiT-S)	2023-03-20
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation	✓ Link	81.33				OFA (T: ViT-B S: ResNet-50)	2023-10-30
Knowledge Diffusion for Distillation	✓ Link	80.5				DiffKD (T:Swin-L S: ResNet-50)	2023-05-25
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections	✓ Link	79.2	6M	✘		VkD (T:RegNety 160 S:DeiT-Ti)	2024-03-10
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	79.15	44.5M	✘		KD++(T:resnet-152 S:resnet-101)	2023-05-26
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks	✓ Link	78.79	56.9M	✘		ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%)	2019-09-17
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks	✓ Link	78.07	40.5M	✘		ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5)	2019-09-17
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	77.48		✘		KD++(T:resnet-152 S:resnet-50)	2023-05-26
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis	✓ Link	77.4	6M	✘		SpectralKD (T:Cait-S24 S:DeiT-T)	2024-12-26
Understanding the Role of the Projector in Knowledge Distillation	✓ Link	77.2	6M	✘		SRD (T:RegNety 160 S:DeIT-Ti)	2023-03-20
Distilling the Knowledge in a Neural Network	✓ Link	77.14	99M	✘		ADLIK-MO(T: ResNet101 S: ResNet50)	2015-03-09
Knowledge Distillation Based on Transformed Teacher Matching	✓ Link	77.03		✘		WTTM (T: DeiT III-Small S:DeiT-Tiny)	2024-02-17
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks	✓ Link	76.376	27M	✘		ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half)	2019-09-17
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	75.53		✘		KD++(T:resnet152 S:resnet34)	2023-05-26
Knowledge Distillation Based on Transformed Teacher Matching	✓ Link	73.09				WTTM (T:resnet50, S:mobilenet-v1)	2024-02-17
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	72.96				ReviewKD++(T:resnet50, S:mobilenet-v1)	2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	72.54		✘		KD++(T:resnet-152 S:resnet18)	2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	72.54		✘		KD++(T:renset101 S:resnet18)	2023-05-26
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	72.53		✘		KD++(T:resnet50 S:resnet18)	2023-05-26
Hierarchical Self-supervised Augmented Knowledge Distillation	✓ Link	72.39		✘		HSAKD (T: ResNet-34 S:ResNet-18)	2021-07-29
Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation	✓ Link	72.19		✘		ICKD (T: ResNet-34 S:ResNet-18)	2021-01-01
Knowledge Distillation Based on Transformed Teacher Matching	✓ Link	72.19		✓		WTTM (T: ResNet-34 S:ResNet-18)	2024-02-17
Knowledge Distillation from A Stronger Teacher	✓ Link	72.07		✘		DIST (T: ResNet-34 S:ResNet-18)	2022-05-21
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	72.07		✘		KD++(T: ResNet-34 S:ResNet-18)	2023-05-26
Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective	✓ Link	72.04				WSL (T: ResNet-34 S:ResNet-18)	2021-02-01
Complementary Relation Contrastive Distillation	✓ Link	71.96		✓		CRCD (T: ResNet-34 S:ResNet-18)	2021-03-29
Understanding the Role of the Projector in Knowledge Distillation	✓ Link	71.87		✓		SRD (T: ResNet-34 S:ResNet-18)	2023-03-20
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	71.84		✘		KD++(T:ViT-B, S:resnet18)	2023-05-26
Distilling Knowledge by Mimicking Features	✓ Link	71.72				LSHFM (T: ResNet-34 S:ResNet-18)	2020-11-03
Information Theoretic Representation Distillation	✓ Link	71.68	11.69M	✓		ITRD (T: ResNet-34 S:ResNet-18)	2021-12-01
Distilling Global and Local Logits With Densely Connected Relations	✓ Link	71.63				GLD (T: ResNet-34 S:ResNet-18)	2021-01-01
Knowledge Distillation Meets Self-Supervision	✓ Link	71.62		✘		SSKD (T: ResNet-34 S:ResNet-18)	2020-06-12
Distilling Knowledge via Knowledge Review	✓ Link	71.61		✓		Knowledge Review (T: ResNet-34 S:ResNet-18)	2021-04-19
Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation	✓ Link	71.61				Adaptive (T:ResNet-50 S:ResNet-18)	2021-10-19
Improving Knowledge Distillation via Regularizing Feature Norm and Direction	✓ Link	71.46		✘		KD++(T: ViT-S, S:resnet18)	2023-05-26
Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching	✓ Link	71.38				AFD (T: ResNet-34 S:ResNet-18)	2021-02-05
Contrastive Representation Distillation	✓ Link	71.38				CRD (T: ResNet-34 S:ResNet-18)	2019-10-23
A Comprehensive Overhaul of Feature Distillation	✓ Link	70.81				Overhual (T: ResNet-34 S:ResNet-18)	2019-04-03
Distilling the Knowledge in a Neural Network	✓ Link	70.66		✓		KD (T: ResNet-34 S:ResNet-18)	2015-03-09
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer	✓ Link				70.70	AT (T: ResNet-34 S:ResNet-18)	2016-12-12
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer	✓ Link			✓		AT (T: ResNet-34 S:ResNet-18)	2016-12-12

OpenCodePapers

knowledge-distillation-on-imagenet