Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 99.76 | | | | CCT-14/7x2 | 2021-04-12 |
Reduction of Class Activation Uncertainty with Background Information | ✓ Link | 99.75 | | | | VIT-L/16 (Background) | 2023-05-05 |
CvT: Introducing Convolutions to Vision Transformers | ✓ Link | 99.72 | | | | CvT-W24 | 2021-03-29 |
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy | ✓ Link | 99.7 | | | | Bamboo (ViT-B/16) | 2022-03-15 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ✓ Link | 99.68 | | | | | 2020-10-22 |
Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 99.65% | | | | EffNet-L2 (SAM) | 2020-10-03 |
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | ✓ Link | 99.65% | | | | ALIGN | 2021-02-11 |
Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 99.63 | | | | BiT-L (ResNet) | 2019-12-24 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 99.5 | | | | ConvMLP-S | 2021-09-09 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 99.5 | | | | ConvMLP-L | 2021-09-09 |
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images | ✓ Link | 99.49 | | | | ResNet-152x4-AGC (ImageNet-21K) | 2021-05-31 |
Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 99.30 | | | | BiT-M (ResNet) | 2019-12-24 |
SpinalNet: Deep Neural Network with Gradual Input | ✓ Link | 99.30 | | | | Wide-ResNet-101 (Spinal FC) | 2020-07-07 |
TResNet: High Performance GPU-Dedicated Architecture | ✓ Link | 99.1% | | | | TResNet-L | 2020-03-30 |
Grafit: Learning fine-grained image representations with coarse labels | | 99.1% | | | | Grafit (RegNet-8GF) | 2020-11-25 |
Going deeper with Image Transformers | ✓ Link | 99.1 | | | | CaiT-M-36 U 224 | 2021-03-31 |
Domain Adaptive Transfer Learning on Visual Attention Aware Data Augmentation for Fine-grained Visual Categorization | | 98.9% | | | | DAT | 2020-10-06 |
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks | ✓ Link | 98.8% | | | | EfficientNet-B7 | 2019-05-28 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 98.8 | | | | EfficientNetV2-L | 2021-04-01 |
Global Filter Networks for Image Classification | ✓ Link | 98.8 | | 54M | | GFNet-H-B | 2021-07-01 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 98.8% | | 86M | | DeiT-B | 2020-12-23 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 98.6 | | | | CeiT-S (384 finetune resolution) | 2021-03-22 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 98.5 | | | | EfficientNetV2-M | 2021-04-01 |
Three things everyone should know about Vision Transformers | ✓ Link | 98.5 | | | | ViT-B (attn finetune) | 2022-03-18 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 98.3 | | | | LeViT-384 | 2021-04-02 |
Neural Architecture Transfer | ✓ Link | 98.3% | 400M | 4.2M | | NAT-M4 | 2020-05-12 |
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images | ✓ Link | 98.21 | | | | ResNet-50x1-ACG (ImageNet-21K) | 2021-05-31 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 98.2 | | | | CeiT-S | 2021-03-22 |
Neural Architecture Transfer | ✓ Link | 98.1% | 250M | 3.7M | | NAT-M3 | 2020-05-12 |
ResNet strikes back: An improved training procedure in timm | ✓ Link | 97.9 | 4.1 | 25M | | ResNet50 (A1) | 2021-10-01 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 97.9 | | | | EfficientNetV2-S | 2021-04-01 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 97.9 | | | | ResMLP24 | 2021-05-07 |
Neural Architecture Transfer | ✓ Link | 97.9% | 195M | 3.4M | | NAT-M2 | 2020-05-12 |
TransBoost: Improving the Best ImageNet Performance using Deep Transduction | ✓ Link | 97.85% | | | | TransBoost-ResNet50 | 2022-05-26 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 97.8 | | | | CeiT-T (384 finetune resolution) | 2021-03-22 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 97.8 | | | | LeViT-192 | 2021-04-02 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 97.7 | | | | LeViT-256 | 2021-04-02 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 97.4 | | | | ResMLP12 | 2021-05-07 |
Classification-Specific Parts for Improving Fine-Grained Visual Categorization | ✓ Link | 96.9% | | | | CS-Parts | 2019-09-16 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 96.9 | | | | CeiT-T | 2021-03-22 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 96.8 | | | | LeViT-128S | 2021-04-02 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 96.3 | | | | SEER (RegNet10B) | 2022-02-16 |
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations | ✓ Link | 95.1 | | | | NNCLR | 2021-04-29 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 91.8 | | | | ViT-B/16- SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 91.5 | | | | ViT-S/16- SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 91.1 | | | | ResNet-152-SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 90 | | | | ResNet-50-SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 90 | | | | Mixer-B/16- SAM | 2021-06-03 |
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics | ✓ Link | 89.00 | | | | MANO-tiny | 2025-07-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 87.9 | | | | Mixer-S/16- SAM | 2021-06-03 |
Your Diffusion Model is Secretly a Zero-Shot Classifier | ✓ Link | | | | 66.3 | Diffusion Classifier (zero-shot) | 2023-03-28 |
Neural Architecture Transfer | ✓ Link | | 152M | 3.3M | | NAT-M1 | 2020-05-12 |