Efficient Adaptive Ensembling for Image Classification | | 96.868 | efficient adaptive ensembling | 2022-06-15 |
ImageNet-21K Pretraining for the Masses | ✓ Link | 96.32 | TResNet-L-V2 | 2021-04-22 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 95.1 | EfficientNetV2-L | 2021-04-01 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 94.6 | EfficientNetV2-M | 2021-04-01 |
Going deeper with Image Transformers | ✓ Link | 94.2 | CaiT-M-36 U 224 | 2021-03-31 |
Domain Adaptive Transfer Learning on Visual Attention Aware Data Augmentation for Fine-grained Visual Categorization | | 94.1 | ImageNet + iNat on WS-DAN | 2020-10-06 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 94.1 | CeiT-S (384 finetune resolution) | 2021-03-22 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 93.8 | EfficientNetV2-S | 2021-04-01 |
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields | ✓ Link | 93.743 | ViT-B/16 (RPE w/ GAB) | 2023-05-08 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 93.2 | CeiT-S | 2021-03-22 |
Global Filter Networks for Image Classification | ✓ Link | 93.2 | GFNet-H-B | 2021-07-01 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 93 | CeiT-T (384 finetune resolution) | 2021-03-22 |
TransBoost: Improving the Best ImageNet Performance using Deep Transduction | ✓ Link | 90.80% | TransBoost-ResNet50 | 2022-05-26 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 90.5 | CeiT-T | 2021-03-22 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 89.8 | LeViT-192 | 2021-04-02 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 89.5 | ResMLP-24 | 2021-05-07 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 89.3 | LeViT-384 | 2021-04-02 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 88.6 | LeViT-128 | 2021-04-02 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 88.4 | LeViT-128S | 2021-04-02 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 88.2 | LeViT-256 | 2021-04-02 |
Stochastic Subsampling With Average Pooling | | 85.812 | SE-ResNet-101 (SAP) | 2024-09-25 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 84.6 | ResMLP-12 | 2021-05-07 |
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields | ✓ Link | 83.89 | ViT-M/16 (RPE w/ GAB) | 2023-05-08 |
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations | ✓ Link | 67.1 | NNCLR | 2021-04-29 |