DINOv2: Learning Robust Visual Features without Supervision | ✓ Link | 28.2 | | 1100M | DINOv2 (ViT-g/14, frozen model, linear eval) | 2023-04-14 |
MetaFormer Baselines for Vision | ✓ Link | 30.8 | | 99M | CAFormer-B36 (IN21K, 384) | 2022-10-24 |
Enhance the Visual Representation via Discrete Adversarial Training | ✓ Link | 31.4 | | 632M | MAE+DAT (ViT-H) | 2022-09-16 |
DINOv2: Learning Robust Visual Features without Supervision | ✓ Link | 31.5 | | 307M | DINOv2 (ViT-L/14, frozen model, linear eval) | 2023-04-14 |
MetaFormer Baselines for Vision | ✓ Link | 31.8 | | | CAFormer-B36 (IN21K) | 2022-10-24 |
Masked Autoencoders Are Scalable Vision Learners | ✓ Link | 33.8 | | 632M | MAE (ViT-H) | 2021-11-11 |
MetaFormer Baselines for Vision | ✓ Link | 35.0 | | | ConvFormer-B36 (IN21K) | 2022-10-24 |
Understanding The Robustness in Vision Transformers | ✓ Link | 35.8 | 73.6 | 77M | FAN-L-Hybrid (IN-22k) | 2022-04-26 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 36.80 | | 87M | Pyramid Adversarial Training Improves ViT (Im21k) | 2021-11-30 |
Improving Vision Transformers by Revisiting High-frequency Components | ✓ Link | 38.4 | | 296M | VOLO-D5+HAT | 2022-04-03 |
Discrete Representations Strengthen Vision Transformer Robustness | ✓ Link | 38.74 | | 87M | DiscreteViT (Im21k) | 2021-11-20 |
A ConvNet for the 2020s | ✓ Link | 38.8 | | 350M | ConvNeXt-XL (Im21k) (augmentation overlap with ImageNet-C) | 2022-01-10 |
Generalized Parametric Contrastive Learning | ✓ Link | 39.0 | | | GPaCo (ViT-L) | 2022-09-26 |
Understanding The Robustness in Vision Transformers | ✓ Link | 41.0 | 70.5 | 50M | FAN-B-Hybrid (IN-22k) | 2022-04-26 |
Pyramid Adversarial Training Improves ViT Performance | ✓ Link | 41.42 | | | Pyramid Adversarial Training Improves ViT | 2021-11-30 |
Fully Attentional Networks with Self-emerging Token Labeling | ✓ Link | 42.1 | 69.2 | 77M | FAN-L-Hybrid+STL | 2024-01-08 |
Quality-Agnostic Image Recognition via Invertible Decoder | ✓ Link | 42.5 | | | QualNet (ResNeXt101) | 2021-06-19 |
MetaFormer Baselines for Vision | ✓ Link | 42.6 | | | CAFormer-B36 | 2022-10-24 |
DINOv2: Learning Robust Visual Features without Supervision | ✓ Link | 42.7 | | 85M | DINOv2 (ViT-B/14, frozen model, linear eval) | 2023-04-14 |
Understanding The Robustness in Vision Transformers | ✓ Link | 43.0 | 67.7 | 77M | FAN-L-Hybrid | 2022-04-26 |
Discrete Representations Strengthen Vision Transformer Robustness | ✓ Link | 46.22 | | | DrViT | 2021-11-20 |
Discrete Representations Strengthen Vision Transformer Robustness | ✓ Link | 46.22 | | 87M | DiscreteViT | 2021-11-20 |
MetaFormer Baselines for Vision | ✓ Link | 46.3 | | | ConvFormer-B36 | 2022-10-24 |
Towards Robust Vision Transformer | ✓ Link | 46.8 | | | RVT-B* | 2021-05-17 |
Sequencer: Deep LSTM for Image Classification | ✓ Link | 48.9 | | | Sequencer2D-L | 2022-05-04 |
Towards Robust Vision Transformer | ✓ Link | 49.4 | | | RVT-S* | 2021-05-17 |
PushPull-Net: Inhibition-driven ResNet robust to image corruptions | ✓ Link | 49.95 | 69.4 | 25.6 | ResNet-50 (PushPull-Conv) + PRIME | 2024-08-07 |
Quality-Agnostic Image Recognition via Invertible Decoder | ✓ Link | 50.6 | | | QualNet (ResNet-50) | 2021-06-19 |
PRIME: A few primitives can boost robustness to common corruptions | ✓ Link | 51.3 | 59.9 | | PRIME + DeepAugment (ResNet-50) | 2021-12-27 |
Global Filter Networks for Image Classification | ✓ Link | 53.8 | | | GFNet-S | 2021-07-01 |
DINOv2: Learning Robust Visual Features without Supervision | ✓ Link | 54.4 | | 21M | DINOv2 (ViT-S/14, frozen model, linear eval) | 2023-04-14 |
PRIME: A few primitives can boost robustness to common corruptions | ✓ Link | 55.5 | 56.4 | | PRIME with JSD (ResNet-50) | 2021-12-27 |
Towards Robust Vision Transformer | ✓ Link | 57.0 | | | RVT-Ti* | 2021-05-17 |
PRIME: A few primitives can boost robustness to common corruptions | ✓ Link | 57.5 | 55.0 | | PRIME (ResNet-50) | 2021-12-27 |
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain | ✓ Link | 57.5 | | | APR-SP + DeepAugment (ResNet-50) | 2021-08-19 |
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization | ✓ Link | 60.4 | | | DeepAugment (ResNet-50) | 2020-06-29 |
Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain | ✓ Link | 65.0 | | | APR-SP (ResNet-50) | 2021-08-19 |
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty | ✓ Link | 65.3 | | | AugMix (ResNet-50) | 2019-12-05 |
ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness | ✓ Link | 69.3 | | | Stylized ImageNet (ResNet-50) | 2018-11-29 |
Group-wise Inhibition based Feature Regularization for Robust Classification | ✓ Link | 69.6 | | | Group-wise Inhibition (ResNet-50) | 2021-03-03 |
Benchmarking Neural Network Robustness to Common Corruptions and Perturbations | ✓ Link | 76.7 | | | ResNet-50 | 2019-03-28 |
Diffusion-Based Adaptation for Classification of Unknown Degraded Images | ✓ Link | | 64.3 | | DiffAUD (ConvNeXt-Tiny) | 2024-06-17 |
Diffusion-Based Adaptation for Classification of Unknown Degraded Images | ✓ Link | | 61 | | DiffAUD (Swin-Tiny) | 2024-06-17 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | | 56.5 | | ViT-B/16-SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | | 55 | | ResNet-152x2-SAM | 2021-06-03 |
Diffusion-Based Adaptation for Classification of Unknown Degraded Images | ✓ Link | | 52.1 | | DiffAUD (ResNet-50) | 2024-06-17 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | | 48.9 | | Mixer-B/8-SAM | 2021-06-03 |