image-classification-on-objectnet

Image Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top-1 Accuracy	Top-5 Accuracy	ModelName	ReleaseDate
CoCa: Contrastive Captioners are Image-Text Foundation Models	✓ Link	82.7		CoCa	2022-05-04
LiT: Zero-Shot Transfer with Locked-image text Tuning	✓ Link	82.5		LiT	2021-11-15
Combined Scaling for Zero-shot Transfer Learning		82.3		BASIC	2021-11-19
EVA-CLIP: Improved Training Techniques for CLIP at Scale	✓ Link	79.6		EVA-02-CLIP-E/14+	2023-03-27
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	✓ Link	79.03		Baseline (ViT-G/14)	2022-03-10
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	✓ Link	78.52		Model soups (ViT-G/14)	2022-03-10
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	77.9		MAWS (ViT-6.5B)	2023-03-23
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	75.8		MAWS (ViT-2B)	2023-03-23
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	72.6		MAWS (ViT-H)	2023-03-23
Learning Transferable Visual Models From Natural Language Supervision	✓ Link	72.3		CLIP	2021-02-26
Combined Scaling for Zero-shot Transfer Learning		72.2		ALIGN	2021-11-19
Robust fine-tuning of zero-shot models	✓ Link	72.1		WiSE-FT	2021-09-04
PaLI: A Jointly-Scaled Multilingual Language-Image Model	✓ Link	72.0		ViT-e	2022-09-14
Scaling Vision Transformers	✓ Link	70.53		ViT-G/14	2021-06-08
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	69.5		SWAG (ViT H/14)	2022-01-20
Scaling Vision Transformers	✓ Link	68.5		NS (Eff.-L2)	2021-06-08
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	64.3		RegNetY 128GF (Platt)	2022-01-20
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others	✓ Link	60.78		LLE (ViT-H/14, MAE, Edge Aug)	2022-12-09
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision	✓ Link	60.2		SEER (RegNet10B)	2022-02-16
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	60		ViT H/14 (Platt)	2022-01-20
Big Transfer (BiT): General Visual Representation Learning	✓ Link	58.7	80	BiT-L (ResNet-152x4)	2019-12-24
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	57.3		ViT L/16 (Platt)	2022-01-20
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy	✓ Link	53.9		Vit B/16 (Bamboo)	2022-03-15
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	52.0	73.5	AR-L (Opt Relevance)	2022-06-02
Matryoshka Representation Learning	✓ Link	51.6		ALIGN-MRL	2022-05-26
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations		50.7		ViT-B/16 (ANN-1.3B)	2021-08-12
Pyramid Adversarial Training Improves ViT Performance	✓ Link	49.39		ViT-B/16 (512x512) + Pyramid	2021-11-30
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations		49.1		ResNet-101 (JFT-300M)	2021-08-12
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	48.9		ViT B/16	2022-01-20
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations		48.4		ViT-B/32	2021-08-12
Pyramid Adversarial Training Improves ViT Performance	✓ Link	47.53		ViT-B/16 (512x512) + Pixel	2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	47.1	70	AR-B (Opt Relevance)	2022-06-02
Big Transfer (BiT): General Visual Representation Learning	✓ Link	47.0	69	BiT-M (ResNet-152x4)	2019-12-24
Pyramid Adversarial Training Improves ViT Performance	✓ Link	46.68		ViT-B/16 (512x512)	2021-11-30
Discrete Representations Strengthen Vision Transformer Robustness	✓ Link	46.62		ViT-B (Discrete 512x512)	2021-11-20
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	46.5	68.3	AR-L	2022-06-02
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	43.2	65.8	ViT-L (Opt Relevance)	2022-06-02
Optimal Representations for Covariate Shift	✓ Link	42.80		CLIP L	2021-12-31
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations		42.5		ResNet-50 (JFT-300M)	2021-08-12
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	42.2	65.1	ViT-B (Opt Relevance)	2022-06-02
Optimal Representations for Covariate Shift	✓ Link	42.10		CLIP L (LAION)	2021-12-31
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	41.4	63.7	AR-B	2022-06-02
Pyramid Adversarial Training Improves ViT Performance	✓ Link	39.79		RegViT on 384x384 + Adv Pyramid	2021-11-30
Generative Interventions for Causal Learning	✓ Link	39.38	61.43	ResNet-152 + GenInt with Transfer	2020-12-22
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	39.3	61.7	AR-S (Opt Relevance)	2022-06-02
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy	✓ Link	38.8		ResNet-50 (Bamboo)	2022-03-15
Pyramid Adversarial Training Improves ViT Performance	✓ Link	37.41		RegViT on 384x384 + Adv Pixel	2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	37.4	59.5	ViT-L	2022-06-02
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	36.3	56.6	DeiT-L (Opt Relevance)	2022-06-02
Big Transfer (BiT): General Visual Representation Learning	✓ Link	36.0	57	BiT-S (ResNet-152x4)	2019-12-24
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models		35.77	56.05	NASNet-A	2019-12-01
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models		35.63	54.95	PNASNet-5L	2019-12-01
Pyramid Adversarial Training Improves ViT Performance	✓ Link	35.59		RegViT on 384x384	2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	35.1	56.4	ViT-B	2022-06-02
Pyramid Adversarial Training Improves ViT Performance	✓ Link	34.83		RegViT on 384x384 + Random Pyramid	2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	34.3	55.8	AR-S	2022-06-02
Pyramid Adversarial Training Improves ViT Performance	✓ Link	34.12		RegViT on 384x384 + Random Pixel	2021-11-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	32.92		RegViT (RandAug) + Adv Pyramid	2021-11-30
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models		32.24	51.98	Inception-v4	2019-12-01
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	31.6	53	DeiT-S (Opt Relevance)	2022-06-02
Context-Gated Convolution	✓ Link	31.53	50.16	ResNet-50 + CGC	2019-10-12
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	31.4	48.5	DeiT-L	2022-06-02
Pyramid Adversarial Training Improves ViT Performance	✓ Link	30.98		Discrete ViT + Pixel	2021-11-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	30.28		Discrete ViT + Pyramid	2021-11-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	30.11		RegViT (RandAug) + Adv Pixel	2021-11-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	29.95		Discrete ViT	2021-11-30
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models		29.59	49.4	ResNet-152	2019-12-01
Pyramid Adversarial Training Improves ViT Performance	✓ Link	29.41		RegViT (RandAug) + Random Pyramid	2021-11-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	29.3		RegViT (RandAug)	2021-11-30
Improving robustness against common corruptions by covariate shift adaptation	✓ Link	29.2	50.2	ResNet-50 + GroupNorm	2020-06-30
Improving robustness against common corruptions by covariate shift adaptation	✓ Link	29.2		ResNet-50 + RoHL	2020-06-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	28.72		RegViT (RandAug) + Random Pixel	2021-11-30
Pyramid Adversarial Training Improves ViT Performance	✓ Link	28.6		MLP-Mixer + Pyramid	2021-11-30
Improving robustness against common corruptions by covariate shift adaptation	✓ Link	28.5	48.6	ResNet-50 + FixUp	2020-06-30
On Mixup Regularization	✓ Link	28.37		ResNet-50 + MixUp (rescaled)	2020-06-10
Optimizing Relevance Maps of Vision Transformers Improves Robustness	✓ Link	28.3	47.3	DeiT-S	2022-06-02
Generative Interventions for Causal Learning	✓ Link	27.03	48.02	ResNet-18 + GenInt with Transfer	2020-12-22
Pyramid Adversarial Training Improves ViT Performance	✓ Link	25.9		MLP-Mixer	2021-11-30
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?	✓ Link	25.9		RELICv2	2022-01-13
Pyramid Adversarial Training Improves ViT Performance	✓ Link	25.65		ViT + MixUp	2021-11-30
Compressive Visual Representations	✓ Link	25.5		C-BYOL	2021-09-27
Pyramid Adversarial Training Improves ViT Performance	✓ Link	24.75		MLP-Mixer + Pixel	2021-11-30
Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations		23.9		BYOL (BG_RM)	2021-03-23
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?	✓ Link	23.8		RELIC	2022-01-13
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?	✓ Link	23		BYOL	2022-01-13
Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations		21.9		SwAV (BG_RM)	2021-03-23
Pyramid Adversarial Training Improves ViT Performance	✓ Link	21.61		ViT + CutMix	2021-11-30
Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations		20.8		MoCo-v2 (BG_Swaps)	2021-03-23
Compressive Visual Representations	✓ Link	20.8		C-SimCLR	2021-09-27
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing		20.61	48.83	SeLa(v2) (reverse linear probing)	2021-09-29
Representation Learning by Detecting Incorrect Location Embeddings	✓ Link	20.51		DILEMMA	2022-04-10
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing		19.73	46.81	DeepCluster(v2) (reverse linear probing)	2021-09-29
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models		19.13	37.15	VGG-14	2019-12-01
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)	✓ Link	18.70		ResNet-50 (ImageNet-Captions)	2022-05-03
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing		17.71	43.64	SwAV (reverse linear probing)	2021-09-29
Pyramid Adversarial Training Improves ViT Performance	✓ Link	17.36		ViT	2021-11-30
Compact and Optimal Deep Learning with Recurrent Parameter Generators	✓ Link	16.5		ResNet34-RPG	2021-07-15
Robust Cross-Modal Representation Learning with Progressive Self-Distillation		15.24		CLIP (CC12M pretrain)	2022-04-10
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?	✓ Link	14.6		SimCLR	2022-01-13
Class-agnostic Object Detection		13.2	29.7	ResNet-152 (FRCNN-ag-ad, VOC)	2020-11-28
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing		12.67	31.45	MoCo(v2) (reverse linear probing)	2021-09-29
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing		12.64	31.71	MoCHi (reverse linear probing)	2021-09-29
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing		12.23	31.72	OBoW (reverse linear probing)	2021-09-29
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models		6.78	17.6	AlexNet	2019-12-01
Self-Supervised Learning for Large-Scale Unsupervised Image Clustering	✓ Link	4.92		BigBiGAN (RevNet-50 4×)	2020-08-24
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	✓ Link		82.1	ViT-H/14	2020-10-22

OpenCodePapers

image-classification-on-objectnet