OpenCodePapers

image-classification-on-inaturalist-2018

Image Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop-1 AccuracyNumber of paramsModelNameReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning94.6OmniVec22024-01-01
OmniVec: Learning robust representations with cross modal sharing93.8OmniVec2023-11-07
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions✓ Link92.6%InternImage-H2022-11-10
The effectiveness of MAE pre-pretraining for billion-scale pretraining✓ Link91.3%MAWS (ViT-2B)2023-03-23
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition✓ Link88.7%MetaFormer (MetaFormer-2,384,extra_info)2022-03-05
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles✓ Link87.3%Hiera-H (448px)2023-06-01
Masked Autoencoders Are Scalable Vision Learners✓ Link86.8%MAE (ViT-H, 448)2021-11-11
Revisiting Weakly Supervised Pre-Training of Visual Perception Models✓ Link86.0%SWAG (ViT H/14)2022-01-20
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision✓ Link84.7%SEER (RegNet10B - finetuned - 384px)2022-02-16
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition✓ Link84.3%MetaFormer (MetaFormer-2,384)2022-03-05
Omnivore: A Single Model for Many Visual Modalities✓ Link84.1%OMNIVORE (Swin-L)2022-01-20
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs✓ Link81.8%186MRDNet-L (224 res, IN-1K pretrained)2024-03-28
Grafit: Learning fine-grained image representations with coarse labels81.2%RegNet-8GF2020-11-25
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition✓ Link81.0%VL-LTR (ViT-B-16)2021-11-26
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems✓ Link80.97ยต2Net+ (ViT-L/16)2022-09-15
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs✓ Link80.587MRDNet-B (224 res, IN-1K pretrained)2024-03-28
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers✓ Link80.3%MixMIM-L2022-05-26
Training data-efficient image transformers & distillation through attention✓ Link79.5%DeiT-B2020-12-23
Incorporating Convolution Designs into Visual Transformers✓ Link79.4%CeiT-S (384 finetune resolution)2021-03-22
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs✓ Link79.150MRDNet-S (224 res, IN-1K pretrained)2024-03-28
Generalized Parametric Contrastive Learning✓ Link78.1%GPaCo (ResNet-152)2022-09-26
Going deeper with Image Transformers✓ Link78%CaiT-M-36 U 2242021-03-31
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers✓ Link77.5%MixMIM-B2022-05-26
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs✓ Link77.024MRDNet-T (224 res, IN-1K pretrained)2024-03-28
Generalized Parametric Contrastive Learning✓ Link75.4%GPaCo (ResNet-50)2022-09-26
Class-Balanced Distillation for Long-Tailed Visual Recognition✓ Link75.3%CBD-ENS (ResNet-101)2021-04-12
Three things everyone should know about Vision Transformers✓ Link75.3%ViT-L (attn finetune)2022-03-18
Parametric Contrastive Learning✓ Link75.2%PaCo(ResNet-152)2021-07-26
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition✓ Link74.6%VL-LTR (ResNet-50)2021-11-26
The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification✓ Link74.0%BS-CMO (ResNet-50)2021-12-01
Class-Balanced Distillation for Long-Tailed Visual Recognition✓ Link73.6%CBD-ENS (ResNet-50)2021-04-12
Incorporating Convolution Designs into Visual Transformers✓ Link73.3%CeiT-S2021-03-22
Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition✓ Link72.9%TADE (ResNet-50)2021-07-20
Incorporating Convolution Designs into Visual Transformers✓ Link72.2%CeiT-T (384 finetune resolution)2021-03-22
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts✓ Link72.2%RIDE (ResNet-50)2020-10-05
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup✓ Link70.54%ResNeXt-101 (SAMix)2021-11-30
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers✓ Link70.49%ResNeXt-101 (AutoMix)2021-03-24
Disentangling Label Distribution for Long-tailed Visual Recognition✓ Link70.0%LADE2020-12-01
Grafit: Learning fine-grained image representations with coarse labels69.8%ResNet-502020-11-25
Feature Space Augmentation for Long-Tailed Data69.08%ResNet-1522020-08-09
Class-Balanced Loss Based on Effective Number of Samples✓ Link69.05%ResNet-1522019-01-16
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition✓ Link68.75%MetaSAug2021-03-23
Feature Space Augmentation for Long-Tailed Data68.39%ResNet-1012020-08-09
Class-Balanced Loss Based on Effective Number of Samples✓ Link67.98%ResNet-1012019-01-16
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link66.9%LeViT-3842021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link66.2%LeViT-2562021-04-02
Feature Space Augmentation for Long-Tailed Data65.91%ResNet-502020-08-09
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup✓ Link64.84%ResNet-50 (SAMix)2021-11-30
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers✓ Link64.73%ResNet-50 (AutoMix)2021-03-24
Incorporating Convolution Designs into Visual Transformers✓ Link64.3%CeiT-T2021-03-22
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link64.3ResMLP-242021-05-07
Class-Balanced Loss Based on Effective Number of Samples✓ Link64.16%ResNet-502019-01-16
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link60.4%LeViT-1922021-04-02
The iNaturalist Species Classification and Detection Dataset✓ Link60.20%Inception-V32017-07-20
ResMLP: Feedforward networks for image classification with data-efficient training✓ Link60.2ResMLP-122021-05-07
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link55.2%LeViT-128S2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference✓ Link54%LeViT-1282021-04-02
ClusterFit: Improving Generalization of Visual Representations✓ Link49.7%ResNet-502019-12-06
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments✓ Link48.6ResNet-502020-06-17
Barlow Twins: Self-Supervised Learning via Redundancy Reduction✓ Link46.5Barlow Twins (ResNet-50)2021-03-04