Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time | ✓ Link | 3.90 | Model soups (BASIC-L) | 2022-03-10 |
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time | ✓ Link | 4.54 | Model soups (ViT-G/14) | 2022-03-10 |
Context-Aware Robust Fine-Tuning | | 10.3 | CAR-FT (CLIP, ViT-L/14@336px) | 2022-11-29 |
Understanding The Robustness in Vision Transformers | ✓ Link | 28.9 | FAN-Hybrid-L(IN-21K, 384)) | 2022-04-26 |
MetaFormer Baselines for Vision | ✓ Link | 29.6 | CAFormer-B36 (IN21K, 384) | 2022-10-24 |
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others | ✓ Link | 31.3 | LLE (ViT-B/16, SWAG, Edge Aug) | 2022-12-09 |
MetaFormer Baselines for Vision | ✓ Link | 31.7 | CAFormer-B36 (IN21K) | 2022-10-24 |
A ConvNet for the 2020s | ✓ Link | 31.8 | ConvNeXt-XL (Im21k, 384) | 2022-01-10 |
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others | ✓ Link | 33.1 | LLE (ViT-H/14, MAE, Edge Aug) | 2022-12-09 |
Masked Autoencoders Are Scalable Vision Learners | ✓ Link | 33.5 | MAE (ViT-H, 448) | 2021-11-11 |
MetaFormer Baselines for Vision | ✓ Link | 33.5 | ConvFormer-B36 (IN21K, 384) | 2022-10-24 |
Enhance the Visual Representation via Discrete Adversarial Training | ✓ Link | 34.39 | MAE+DAT (ViT-H) | 2022-09-16 |
MetaFormer Baselines for Vision | ✓ Link | 34.7 | ConvFormer-B36 (IN21K) | 2022-10-24 |
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models | ✓ Link | 34.9 | Discrete Adversarial Distillation (ViT-B,224) | 2023-11-02 |
Generalized Parametric Contrastive Learning | ✓ Link | 39.7 | GPaCo (ViT-L) | 2022-09-26 |
Improving Vision Transformers by Revisiting High-frequency Components | ✓ Link | 40.3 | VOLO-D5+HAT | 2022-04-03 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 42.16 | Pyramid Adversarial Training Improves ViT (Im21k) | 2021-11-30 |
Fully Attentional Networks with Self-emerging Token Labeling | ✓ Link | 43.4 | FAN-L-Hybrid+STL | 2024-01-08 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 43.9 | SEER (RegNet10B) | 2022-02-16 |
Discrete Representations Strengthen Vision Transformer Robustness | ✓ Link | 44.74 | DiscreteViT | 2021-11-20 |
MetaFormer Baselines for Vision | ✓ Link | 45 | CAFormer-B36 (384) | 2022-10-24 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 46.08 | Pyramid Adversarial Training Improves ViT | 2021-11-30 |
MetaFormer Baselines for Vision | ✓ Link | 46.1 | CAFormer-B36 | 2022-10-24 |
MetaFormer Baselines for Vision | ✓ Link | 47.8 | ConvFormer-B36 (384) | 2022-10-24 |
MetaFormer Baselines for Vision | ✓ Link | 48.9 | ConvFormer-B36 | 2022-10-24 |
Towards Robust Vision Transformer | ✓ Link | 51.3 | RVT-B* | 2021-05-17 |
Sequencer: Deep LSTM for Image Classification | ✓ Link | 51.9 | Sequencer2D-L | 2022-05-04 |
Towards Robust Vision Transformer | ✓ Link | 52.3 | RVT-S* | 2021-05-17 |
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization | ✓ Link | 53.2 | DeepAugment+AugMix (ResNet-50) | 2020-06-29 |
PRIME: A few primitives can boost robustness to common corruptions | ✓ Link | 53.7 | PRIME with JSD (ResNet-50) | 2021-12-27 |
Towards Robust Vision Transformer | ✓ Link | 56.1 | RVT-Ti* | 2021-05-17 |
PRIME: A few primitives can boost robustness to common corruptions | ✓ Link | 57.1 | PRIME (ResNet-50) | 2021-12-27 |
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization | ✓ Link | 57.8 | DeepAugment (ResNet-50) | 2020-06-29 |
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness | ✓ Link | 58.5 | Stylized ImageNet (ResNet-50) | 2018-11-29 |
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty | ✓ Link | 58.9 | AugMix (ResNet-50) | 2019-12-05 |
Deep Residual Learning for Image Recognition | ✓ Link | 63.9 | ResNet-50 | 2015-12-10 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 71.9 | ResNet-152x2-SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 73.6 | ViT-B/16-SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 76.5 | Mixer-B/8-SAM | 2021-06-03 |