image-classification-on-imagenet-v2

Image Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top 1 Accuracy	ModelName	ReleaseDate
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	✓ Link	84.63	Model soups (BASIC-L)	2022-03-10
PaLI: A Jointly-Scaled Multilingual Language-Image Model	✓ Link	84.3	ViT-e	2022-09-14
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	✓ Link	84.22	Model soups (ViT-G/14)	2022-03-10
Swin Transformer V2: Scaling Up Capacity and Resolution	✓ Link	84.00%	SwinV2-G	2021-11-18
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	84.0	MAWS (ViT-6.5B)	2023-03-23
Scaling Vision Transformers	✓ Link	83.33	ViT-G/14	2021-06-08
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	83.0	MAWS (ViT-2B)	2023-03-23
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	81.5	MOAT-4 (IN-22K pretraining)	2022-10-04
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	81.1	SWAG (ViT H/14)	2022-01-20
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	80.6	MOAT-3 (IN-22K pretraining)	2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	79.3	MOAT-2 (IN-22K pretraining)	2022-10-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	78.4	MOAT-1 (IN-22K pretraining)	2022-10-04
Swin Transformer V2: Scaling Up Capacity and Resolution	✓ Link	78.08	SwinV2-B	2021-11-18
VOLO: Vision Outlooker for Visual Recognition	✓ Link	78	VOLO-D5	2021-06-24
VOLO: Vision Outlooker for Visual Recognition	✓ Link	77.8	VOLO-D4	2021-06-24
Going deeper with Image Transformers	✓ Link	76.7	CAIT-M36-448	2021-03-31
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision	✓ Link	76.2	SEER (RegNet10B)	2022-02-16
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	74.2	ResMLP-B24/8 22k	2021-05-07
Three things everyone should know about Vision Transformers	✓ Link	73.9	ViT-B-36x1	2022-03-18
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	73.4	ResMLP-B24/8	2021-05-07
Sequencer: Deep LSTM for Image Classification	✓ Link	73.4	Sequencer2D-L	2022-05-04
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models	✓ Link	71.7	Discrete Adversarial Distillation (ViT-B, 224)	2023-11-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	71.4	LeViT-384	2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	69.9	LeViT-256	2021-04-02
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	69.8	ResMLP-S24/16	2021-05-07
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations	✓ Link	69.6	ResNet-152x2-SAM	2021-06-03
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	68.7	LeViT-192	2021-04-02
ResNet strikes back: An improved training procedure in timm	✓ Link	68.7	ResNet50 (A1)	2021-10-01
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	67.5	LeViT-128	2021-04-02
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations	✓ Link	67.5	ViT-B/16-SAM	2021-06-03
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	66.0	ResMLP-S12/16	2021-05-07
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations	✓ Link	65.5	Mixer-B/8-SAM	2021-06-03
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	63.9	LeViT-128S	2021-04-02

OpenCodePapers

image-classification-on-imagenet-v2