Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time | ✓ Link | 94.17 | | Model soups (BASIC-L) | 2022-03-10 |
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time | ✓ Link | 92.67 | | Model soups (ViT-G/14) | 2022-03-10 |
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems | ✓ Link | 84.53 | | ยต2Net+ (ViT-L/16) | 2022-09-15 |
Context-Aware Robust Fine-Tuning | | 81.5 | | CAR-FT (CLIP, ViT-L/14@336px) | 2022-11-29 |
MetaFormer Baselines for Vision | ✓ Link | 79.5 | 99M | CAFormer-B36 (IN-21K, 384) | 2022-10-24 |
Masked Autoencoders Are Scalable Vision Learners | ✓ Link | 76.7 | | MAE (ViT-H, 448) | 2021-11-11 |
Understanding The Robustness in Vision Transformers | ✓ Link | 74.5 | | FAN-Hybrid-L(IN-21K, 384) | 2022-04-26 |
MetaFormer Baselines for Vision | ✓ Link | 73.5 | 100M | ConvFormer-B36 (IN-21K, 384) | 2022-10-24 |
MetaFormer Baselines for Vision | ✓ Link | 69.4 | 99M | CAFormer-B36 (IN-21K) | 2022-10-24 |
A ConvNet for the 2020s | ✓ Link | 69.3 | | ConvNeXt-XL (Im21k, 384) | 2022-01-10 |
Enhance the Visual Representation via Discrete Adversarial Training | ✓ Link | 68.92 | | MAE+DAT (ViT-H) | 2022-09-16 |
MetaFormer Baselines for Vision | ✓ Link | 63.3 | 100M | ConvFormer-B36 (IN-21K) | 2022-10-24 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 62.44 | | Pyramid Adversarial Training Improves ViT (Im21k) | 2021-11-30 |
MetaFormer Baselines for Vision | ✓ Link | 61.9 | 99M | CAFormer-B36 (384) | 2022-10-24 |
TransNeXt: Robust Foveal Visual Perception for Vision Transformers | ✓ Link | 61.6 | 89.7M | TransNeXt-Base (IN-1K supervised, 384) | 2023-11-28 |
TransNeXt: Robust Foveal Visual Perception for Vision Transformers | ✓ Link | 58.3 | 49.7M | TransNeXt-Small (IN-1K supervised, 384) | 2023-11-28 |
MetaFormer Baselines for Vision | ✓ Link | 55.3 | 100M | ConvFormer-B36 (384) | 2022-10-24 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 52.7 | | SEER (RegNet10B) | 2022-02-16 |
TransNeXt: Robust Foveal Visual Perception for Vision Transformers | ✓ Link | 50.6 | 89.7M | TransNeXt-Base (IN-1K supervised, 224) | 2023-11-28 |
MetaFormer Baselines for Vision | ✓ Link | 48.5 | 99M | CAFormer-B36 | 2022-10-24 |
TransNeXt: Robust Foveal Visual Perception for Vision Transformers | ✓ Link | 47.1 | 49.7M | TransNeXt-Small (IN-1K supervised, 224) | 2023-11-28 |
Fully Attentional Networks with Self-emerging Token Labeling | ✓ Link | 46.1 | | FAN-L-Hybrid+STL | 2024-01-08 |
MetaFormer Baselines for Vision | ✓ Link | 40.1 | 100M | ConvFormer-B36 | 2022-10-24 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 36.41 | | Pyramid Adversarial Training Improves ViT (384x384) | 2021-11-30 |
Sequencer: Deep LSTM for Image Classification | ✓ Link | 35.5 | | Sequencer2D-L | 2022-05-04 |
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models | ✓ Link | 31.8 | | Discrete Adversarial Distillation (ViT-B/224) | 2023-11-02 |
Your Diffusion Model is Secretly a Zero-Shot Classifier | ✓ Link | 30.2 | | Diffusion Classifier | 2023-03-28 |
Towards Robust Vision Transformer | ✓ Link | 28.5 | | RVT-B* | 2021-05-17 |
Towards Robust Vision Transformer | ✓ Link | 25.7 | | RVT-S* | 2021-05-17 |
Towards Robust Vision Transformer | ✓ Link | 14.4 | | RVT-Ti* | 2021-05-17 |
Global Filter Networks for Image Classification | ✓ Link | 14.3 | | GFNet-S | 2021-07-01 |
On Feature Normalization and Data Augmentation | ✓ Link | 8.4 | | CutMix+MoEx (ResNet-50) | 2020-02-25 |
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models | ✓ Link | 7.7 | | Discrete Adversarial Distillation (ResNet-50) | 2023-11-02 |
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features | ✓ Link | 7.3 | | CutMix (ResNet-50) | 2019-05-13 |
mixup: Beyond Empirical Risk Minimization | ✓ Link | 6.6 | | Mixup (ResNet-50) | 2017-10-25 |
Improved Regularization of Convolutional Neural Networks with Cutout | ✓ Link | 4.4 | | Cutout (ResNet-50) | 2017-08-15 |
Deep Residual Learning for Image Recognition | ✓ Link | 4.2 | | ResNet-50 (300 Epochs) | 2015-12-10 |
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness | ✓ Link | 2.3 | | Stylized ImageNet (ResNet-50) | 2018-11-29 |
Natural Adversarial Examples | ✓ Link | 0 | | ResNet-50 | 2019-07-16 |