OpenCodePapers

image-classification-on-imagenet-v2

Image Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop 1 AccuracyModelNameReleaseDate
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time✓ Link84.63Model soups (BASIC-L)2022-03-10
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link84.3ViT-e2022-09-14
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time✓ Link84.22Model soups (ViT-G/14)2022-03-10
Swin Transformer V2: Scaling Up Capacity and Resolution✓ Link84.00%SwinV2-G2021-11-18
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link84.0MAWS (ViT-6.5B)2023-03-23
Scaling Vision Transformers✓ Link83.33ViT-G/142021-06-08
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link83.0MAWS (ViT-2B)2023-03-23
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models✓ Link81.5MOAT-4 (IN-22K pretraining)2022-10-04
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link81.1SWAG (ViT H/14)2022-01-20
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models✓ Link80.6MOAT-3 (IN-22K pretraining)2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models✓ Link79.3MOAT-2 (IN-22K pretraining)2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models✓ Link78.4MOAT-1 (IN-22K pretraining)2022-10-04
Swin Transformer V2: Scaling Up Capacity and Resolution✓ Link78.08SwinV2-B2021-11-18
VOLO: Vision Outlooker for Visual Recognition✓ Link78VOLO-D52021-06-24
VOLO: Vision Outlooker for Visual Recognition✓ Link77.8VOLO-D42021-06-24
Going deeper with Image Transformers✓ Link76.7CAIT-M36-4482021-03-31
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision✓ Link76.2SEER (RegNet10B)2022-02-16
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link74.2ResMLP-B24/8 22k2021-05-07
Three things everyone should know about Vision Transformers✓ Link73.9ViT-B-36x12022-03-18
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link73.4ResMLP-B24/82021-05-07
Sequencer: Deep LSTM for Image Classification✓ Link73.4Sequencer2D-L2022-05-04
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models✓ Link71.7Discrete Adversarial Distillation (ViT-B, 224)2023-11-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link71.4LeViT-3842021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link69.9LeViT-2562021-04-02
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link69.8ResMLP-S24/162021-05-07
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations✓ Link69.6ResNet-152x2-SAM2021-06-03
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link68.7LeViT-1922021-04-02
ResNet strikes back: An improved training procedure in timm✓ Link68.7ResNet50 (A1)2021-10-01
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link67.5LeViT-1282021-04-02
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations✓ Link67.5ViT-B/16-SAM2021-06-03
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link66.0ResMLP-S12/162021-05-07
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations✓ Link65.5Mixer-B/8-SAM2021-06-03
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link63.9LeViT-128S2021-04-02