Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time | ✓ Link | 77.18 | Model soups (BASIC-L) | 2022-03-10 |
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time | ✓ Link | 74.24 | Model soups (ViT-G/14) | 2022-03-10 |
Context-Aware Robust Fine-Tuning | | 65.5 | CAR-FT (CLIP, ViT-L/14@336px) | 2022-11-29 |
A ConvNet for the 2020s | ✓ Link | 55.0 | ConvNeXt-XL (Im21k, 384) | 2022-01-10 |
MetaFormer Baselines for Vision | ✓ Link | 54.5 | CAFormer-B36 (IN21K, 384) | 2022-10-24 |
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others | ✓ Link | 53.39 | LLE (ViT-H/14, MAE, Edge Aug) | 2022-12-09 |
MetaFormer Baselines for Vision | ✓ Link | 52.9 | ConvFormer-B36 (IN21K, 384) | 2022-10-24 |
MetaFormer Baselines for Vision | ✓ Link | 52.8 | CAFormer-B36 (IN21K) | 2022-10-24 |
MetaFormer Baselines for Vision | ✓ Link | 52.7 | ConvFormer-B36 (IN21K) | 2022-10-24 |
Masked Autoencoders Are Scalable Vision Learners | ✓ Link | 50.9 | MAE (ViT-H, 448) | 2021-11-11 |
Enhance the Visual Representation via Discrete Adversarial Training | ✓ Link | 50.03 | MAE+DAT (ViT-H) | 2022-09-16 |
Generalized Parametric Contrastive Learning | ✓ Link | 48.3 | GPaCo (ViT-L) | 2022-09-26 |
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models | ✓ Link | 46.1 | Discrete Adversarial Distillation (ViT-B, 224) | 2023-11-02 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 46.03 | Pyramid Adversarial Training Improves ViT (Im21k) | 2021-11-30 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 45.6 | SEER (RegNet10B) | 2022-02-16 |
Discrete Representations Strengthen Vision Transformer Robustness | ✓ Link | 44.72 | DrViT | 2021-11-20 |
MetaFormer Baselines for Vision | ✓ Link | 42.5 | CAFormer-B36 | 2022-10-24 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 41.04 | Pyramid Adversarial Training Improves ViT | 2021-11-30 |
MetaFormer Baselines for Vision | ✓ Link | 39.5 | ConvFormer-B36 | 2022-10-24 |
Sequencer: Deep LSTM for Image Classification | ✓ Link | 35.8 | Sequencer2D-L | 2022-05-04 |