OpenCodePapers

image-classification-on-objectnet

Image Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop-1 AccuracyTop-5 AccuracyModelNameReleaseDate
CoCa: Contrastive Captioners are Image-Text Foundation Models✓ Link82.7CoCa2022-05-04
LiT: Zero-Shot Transfer with Locked-image text Tuning✓ Link82.5LiT2021-11-15
Combined Scaling for Zero-shot Transfer Learning82.3BASIC2021-11-19
EVA-CLIP: Improved Training Techniques for CLIP at Scale✓ Link79.6EVA-02-CLIP-E/14+2023-03-27
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time✓ Link79.03Baseline (ViT-G/14)2022-03-10
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time✓ Link78.52Model soups (ViT-G/14)2022-03-10
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link77.9MAWS (ViT-6.5B)2023-03-23
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link75.8MAWS (ViT-2B)2023-03-23
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link72.6MAWS (ViT-H)2023-03-23
Learning Transferable Visual Models From Natural Language Supervision✓ Link72.3CLIP2021-02-26
Combined Scaling for Zero-shot Transfer Learning72.2ALIGN2021-11-19
Robust fine-tuning of zero-shot models✓ Link72.1WiSE-FT2021-09-04
PaLI: A Jointly-Scaled Multilingual Language-Image Model✓ Link72.0ViT-e2022-09-14
Scaling Vision Transformers✓ Link70.53ViT-G/142021-06-08
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link69.5SWAG (ViT H/14)2022-01-20
Scaling Vision Transformers✓ Link68.5NS (Eff.-L2)2021-06-08
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link64.3RegNetY 128GF (Platt)2022-01-20
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others✓ Link60.78LLE (ViT-H/14, MAE, Edge Aug)2022-12-09
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision✓ Link60.2SEER (RegNet10B)2022-02-16
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link60ViT H/14 (Platt)2022-01-20
Big Transfer (BiT): General Visual Representation Learning✓ Link58.780BiT-L (ResNet-152x4)2019-12-24
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link57.3ViT L/16 (Platt)2022-01-20
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy✓ Link53.9Vit B/16 (Bamboo)2022-03-15
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link52.073.5AR-L (Opt Relevance)2022-06-02
Matryoshka Representation Learning✓ Link51.6ALIGN-MRL2022-05-26
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations50.7ViT-B/16 (ANN-1.3B)2021-08-12
Pyramid Adversarial Training Improves ViT Performance✓ Link49.39ViT-B/16 (512x512) + Pyramid2021-11-30
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations49.1ResNet-101 (JFT-300M)2021-08-12
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link48.9ViT B/162022-01-20
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations48.4ViT-B/322021-08-12
Pyramid Adversarial Training Improves ViT Performance✓ Link47.53ViT-B/16 (512x512) + Pixel2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link47.170AR-B (Opt Relevance)2022-06-02
Big Transfer (BiT): General Visual Representation Learning✓ Link47.069BiT-M (ResNet-152x4)2019-12-24
Pyramid Adversarial Training Improves ViT Performance✓ Link46.68ViT-B/16 (512x512)2021-11-30
Discrete Representations Strengthen Vision Transformer Robustness✓ Link46.62ViT-B (Discrete 512x512)2021-11-20
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link46.568.3AR-L2022-06-02
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link43.265.8ViT-L (Opt Relevance)2022-06-02
Optimal Representations for Covariate Shift✓ Link42.80CLIP L2021-12-31
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations42.5ResNet-50 (JFT-300M)2021-08-12
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link42.265.1ViT-B (Opt Relevance)2022-06-02
Optimal Representations for Covariate Shift✓ Link42.10CLIP L (LAION)2021-12-31
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link41.463.7AR-B2022-06-02
Pyramid Adversarial Training Improves ViT Performance✓ Link39.79RegViT on 384x384 + Adv Pyramid2021-11-30
Generative Interventions for Causal Learning✓ Link39.3861.43ResNet-152 + GenInt with Transfer2020-12-22
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link39.361.7AR-S (Opt Relevance)2022-06-02
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy✓ Link38.8ResNet-50 (Bamboo)2022-03-15
Pyramid Adversarial Training Improves ViT Performance✓ Link37.41RegViT on 384x384 + Adv Pixel2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link37.459.5ViT-L2022-06-02
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link36.356.6DeiT-L (Opt Relevance)2022-06-02
Big Transfer (BiT): General Visual Representation Learning✓ Link36.057BiT-S (ResNet-152x4)2019-12-24
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models35.7756.05NASNet-A2019-12-01
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models35.6354.95PNASNet-5L2019-12-01
Pyramid Adversarial Training Improves ViT Performance✓ Link35.59RegViT on 384x3842021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link35.156.4ViT-B2022-06-02
Pyramid Adversarial Training Improves ViT Performance✓ Link34.83RegViT on 384x384 + Random Pyramid2021-11-30
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link34.355.8AR-S2022-06-02
Pyramid Adversarial Training Improves ViT Performance✓ Link34.12RegViT on 384x384 + Random Pixel2021-11-30
Pyramid Adversarial Training Improves ViT Performance✓ Link32.92RegViT (RandAug) + Adv Pyramid2021-11-30
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models32.2451.98Inception-v42019-12-01
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link31.653DeiT-S (Opt Relevance)2022-06-02
Context-Gated Convolution✓ Link31.5350.16ResNet-50 + CGC2019-10-12
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link31.448.5DeiT-L2022-06-02
Pyramid Adversarial Training Improves ViT Performance✓ Link30.98Discrete ViT + Pixel2021-11-30
Pyramid Adversarial Training Improves ViT Performance✓ Link30.28Discrete ViT + Pyramid2021-11-30
Pyramid Adversarial Training Improves ViT Performance✓ Link30.11RegViT (RandAug) + Adv Pixel2021-11-30
Pyramid Adversarial Training Improves ViT Performance✓ Link29.95Discrete ViT2021-11-30
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models29.5949.4ResNet-1522019-12-01
Pyramid Adversarial Training Improves ViT Performance✓ Link29.41RegViT (RandAug) + Random Pyramid2021-11-30
Pyramid Adversarial Training Improves ViT Performance✓ Link29.3RegViT (RandAug)2021-11-30
Improving robustness against common corruptions by covariate shift adaptation✓ Link29.250.2ResNet-50 + GroupNorm2020-06-30
Improving robustness against common corruptions by covariate shift adaptation✓ Link29.2ResNet-50 + RoHL2020-06-30
Pyramid Adversarial Training Improves ViT Performance✓ Link28.72RegViT (RandAug) + Random Pixel2021-11-30
Pyramid Adversarial Training Improves ViT Performance✓ Link28.6MLP-Mixer + Pyramid2021-11-30
Improving robustness against common corruptions by covariate shift adaptation✓ Link28.548.6ResNet-50 + FixUp2020-06-30
On Mixup Regularization✓ Link28.37ResNet-50 + MixUp (rescaled)2020-06-10
Optimizing Relevance Maps of Vision Transformers Improves Robustness✓ Link28.347.3DeiT-S2022-06-02
Generative Interventions for Causal Learning✓ Link27.0348.02ResNet-18 + GenInt with Transfer2020-12-22
Pyramid Adversarial Training Improves ViT Performance✓ Link25.9MLP-Mixer2021-11-30
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?✓ Link25.9RELICv22022-01-13
Pyramid Adversarial Training Improves ViT Performance✓ Link25.65ViT + MixUp2021-11-30
Compressive Visual Representations✓ Link25.5C-BYOL2021-09-27
Pyramid Adversarial Training Improves ViT Performance✓ Link24.75MLP-Mixer + Pixel2021-11-30
Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations23.9BYOL (BG_RM)2021-03-23
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?✓ Link23.8RELIC2022-01-13
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?✓ Link23BYOL2022-01-13
Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations21.9SwAV (BG_RM)2021-03-23
Pyramid Adversarial Training Improves ViT Performance✓ Link21.61ViT + CutMix2021-11-30
Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations20.8MoCo-v2 (BG_Swaps)2021-03-23
Compressive Visual Representations✓ Link20.8C-SimCLR2021-09-27
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing20.6148.83SeLa(v2) (reverse linear probing)2021-09-29
Representation Learning by Detecting Incorrect Location Embeddings✓ Link20.51DILEMMA2022-04-10
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing19.7346.81DeepCluster(v2) (reverse linear probing)2021-09-29
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models19.1337.15VGG-142019-12-01
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)✓ Link18.70ResNet-50 (ImageNet-Captions)2022-05-03
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing17.7143.64SwAV (reverse linear probing)2021-09-29
Pyramid Adversarial Training Improves ViT Performance✓ Link17.36ViT2021-11-30
Compact and Optimal Deep Learning with Recurrent Parameter Generators✓ Link16.5ResNet34-RPG2021-07-15
Robust Cross-Modal Representation Learning with Progressive Self-Distillation15.24CLIP (CC12M pretrain)2022-04-10
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?✓ Link14.6SimCLR2022-01-13
Class-agnostic Object Detection13.229.7ResNet-152 (FRCNN-ag-ad, VOC)2020-11-28
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing12.6731.45MoCo(v2) (reverse linear probing)2021-09-29
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing12.6431.71MoCHi (reverse linear probing)2021-09-29
Measuring the Interpretability of Unsupervised Representations via Quantized Reversed Probing12.2331.72OBoW (reverse linear probing)2021-09-29
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models6.7817.6AlexNet2019-12-01
Self-Supervised Learning for Large-Scale Unsupervised Image Clustering✓ Link4.92BigBiGAN (RevNet-50 4×)2020-08-24
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale✓ Link82.1ViT-H/142020-10-22