ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | ✓ Link | 86.43 | 87M | ✘ | | ScaleKD (T:BEiT-L S:ViT-B/14) | 2024-11-11 |
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | ✓ Link | 85.53 | 87M | ✘ | | ScaleKD (T:Swin-L S:ViT-B/16) | 2024-11-11 |
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | ✓ Link | 83.93 | 22M | ✘ | | ScaleKD (T:Swin-L S:ViT-S/16) | 2024-11-11 |
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | ✓ Link | 83.8 | 27M | ✘ | | ScaleKD (T:Swin-L S:Swin-T) | 2024-11-11 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 83.60 | 87M | ✘ | | KD++(T: regnety-16GF S:ViT-B) | 2023-05-26 |
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections | ✓ Link | 82.9 | 22M | ✘ | | VkD (T:RegNety 160 S:DeiT-S) | 2024-03-10 |
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis | ✓ Link | 82.7 | 22M | ✘ | | SpectralKD (T:Swin-S S:Swin-T) | 2024-12-26 |
ScaleKD: Strong Vision Transformers Could Be Excellent Teachers | ✓ Link | 82.55 | 22M | ✘ | | ScaleKD (T:Swin-L S:ResNet-50) | 2024-11-11 |
Knowledge Diffusion for Distillation | ✓ Link | 82.5 | | | | DiffKD (T:Swin-L S: Swin-T) | 2023-05-25 |
Knowledge Distillation from A Stronger Teacher | ✓ Link | 82.3 | 29M | ✘ | | DIST (T: Swin-L S: Swin-T) | 2022-05-21 |
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis | ✓ Link | 82.2 | 22M | ✘ | | SpectralKD (T:Cait-S24 S:DeiT-S) | 2024-12-26 |
Understanding the Role of the Projector in Knowledge Distillation | ✓ Link | 82.1 | 22M | ✘ | | SRD (T:RegNety 160 S:DeiT-S) | 2023-03-20 |
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation | ✓ Link | 81.33 | | | | OFA (T: ViT-B S: ResNet-50) | 2023-10-30 |
Knowledge Diffusion for Distillation | ✓ Link | 80.5 | | | | DiffKD (T:Swin-L S: ResNet-50) | 2023-05-25 |
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections | ✓ Link | 79.2 | 6M | ✘ | | VkD (T:RegNety 160 S:DeiT-Ti) | 2024-03-10 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 79.15 | 44.5M | ✘ | | KD++(T:resnet-152 S:resnet-101) | 2023-05-26 |
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks | ✓ Link | 78.79 | 56.9M | ✘ | | ADLIK-MO-P25(T:SeNet154, ResNet152b S:ResNet-50-prune25%) | 2019-09-17 |
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks | ✓ Link | 78.07 | 40.5M | ✘ | | ADLIK-MO-P375(T:SeNet154, ResNet152b S:ResNet-50-prune37.5) | 2019-09-17 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 77.48 | | ✘ | | KD++(T:resnet-152 S:resnet-50) | 2023-05-26 |
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis | ✓ Link | 77.4 | 6M | ✘ | | SpectralKD (T:Cait-S24 S:DeiT-T) | 2024-12-26 |
Understanding the Role of the Projector in Knowledge Distillation | ✓ Link | 77.2 | 6M | ✘ | | SRD (T:RegNety 160 S:DeIT-Ti) | 2023-03-20 |
Distilling the Knowledge in a Neural Network | ✓ Link | 77.14 | 99M | ✘ | | ADLIK-MO(T: ResNet101 S: ResNet50) | 2015-03-09 |
Knowledge Distillation Based on Transformed Teacher Matching | ✓ Link | 77.03 | | ✘ | | WTTM (T: DeiT III-Small S:DeiT-Tiny) | 2024-02-17 |
Ensemble Knowledge Distillation for Learning Improved and Efficient Networks | ✓ Link | 76.376 | 27M | ✘ | | ADLIK-MO-P50(T:SeNet154, ResNet152b S:ResNet-50-half) | 2019-09-17 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 75.53 | | ✘ | | KD++(T:resnet152 S:resnet34) | 2023-05-26 |
Knowledge Distillation Based on Transformed Teacher Matching | ✓ Link | 73.09 | | | | WTTM (T:resnet50, S:mobilenet-v1) | 2024-02-17 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 72.96 | | | | ReviewKD++(T:resnet50, S:mobilenet-v1) | 2023-05-26 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 72.54 | | ✘ | | KD++(T:resnet-152 S:resnet18) | 2023-05-26 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 72.54 | | ✘ | | KD++(T:renset101 S:resnet18) | 2023-05-26 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 72.53 | | ✘ | | KD++(T:resnet50 S:resnet18) | 2023-05-26 |
Hierarchical Self-supervised Augmented Knowledge Distillation | ✓ Link | 72.39 | | ✘ | | HSAKD (T: ResNet-34 S:ResNet-18) | 2021-07-29 |
Exploring Inter-Channel Correlation for Diversity-Preserved Knowledge Distillation | ✓ Link | 72.19 | | ✘ | | ICKD (T: ResNet-34 S:ResNet-18) | 2021-01-01 |
Knowledge Distillation Based on Transformed Teacher Matching | ✓ Link | 72.19 | | ✓ | | WTTM (T: ResNet-34 S:ResNet-18) | 2024-02-17 |
Knowledge Distillation from A Stronger Teacher | ✓ Link | 72.07 | | ✘ | | DIST (T: ResNet-34 S:ResNet-18) | 2022-05-21 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 72.07 | | ✘ | | KD++(T: ResNet-34 S:ResNet-18) | 2023-05-26 |
Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective | ✓ Link | 72.04 | | | | WSL (T: ResNet-34 S:ResNet-18) | 2021-02-01 |
Complementary Relation Contrastive Distillation | ✓ Link | 71.96 | | ✓ | | CRCD (T: ResNet-34 S:ResNet-18) | 2021-03-29 |
Understanding the Role of the Projector in Knowledge Distillation | ✓ Link | 71.87 | | ✓ | | SRD (T: ResNet-34 S:ResNet-18) | 2023-03-20 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 71.84 | | ✘ | | KD++(T:ViT-B, S:resnet18) | 2023-05-26 |
Distilling Knowledge by Mimicking Features | ✓ Link | 71.72 | | | | LSHFM (T: ResNet-34 S:ResNet-18) | 2020-11-03 |
Information Theoretic Representation Distillation | ✓ Link | 71.68 | 11.69M | ✓ | | ITRD (T: ResNet-34 S:ResNet-18) | 2021-12-01 |
Distilling Global and Local Logits With Densely Connected Relations | ✓ Link | 71.63 | | | | GLD (T: ResNet-34 S:ResNet-18) | 2021-01-01 |
Knowledge Distillation Meets Self-Supervision | ✓ Link | 71.62 | | ✘ | | SSKD (T: ResNet-34 S:ResNet-18) | 2020-06-12 |
Distilling Knowledge via Knowledge Review | ✓ Link | 71.61 | | ✓ | | Knowledge Review (T: ResNet-34 S:ResNet-18) | 2021-04-19 |
Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation | ✓ Link | 71.61 | | | | Adaptive (T:ResNet-50 S:ResNet-18) | 2021-10-19 |
Improving Knowledge Distillation via Regularizing Feature Norm and Direction | ✓ Link | 71.46 | | ✘ | | KD++(T: ViT-S, S:resnet18) | 2023-05-26 |
Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching | ✓ Link | 71.38 | | | | AFD (T: ResNet-34 S:ResNet-18) | 2021-02-05 |
Contrastive Representation Distillation | ✓ Link | 71.38 | | | | CRD (T: ResNet-34 S:ResNet-18) | 2019-10-23 |
A Comprehensive Overhaul of Feature Distillation | ✓ Link | 70.81 | | | | Overhual (T: ResNet-34 S:ResNet-18) | 2019-04-03 |
Distilling the Knowledge in a Neural Network | ✓ Link | 70.66 | | ✓ | | KD (T: ResNet-34 S:ResNet-18) | 2015-03-09 |
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer | ✓ Link | | | | 70.70 | AT (T: ResNet-34 S:ResNet-18) | 2016-12-12 |
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer | ✓ Link | | | ✓ | | AT (T: ResNet-34 S:ResNet-18) | 2016-12-12 |