image-classification-on-inaturalist-2018

Image Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top-1 Accuracy	Number of params	ModelName	ReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning		94.6		OmniVec2	2024-01-01
OmniVec: Learning robust representations with cross modal sharing		93.8		OmniVec	2023-11-07
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	92.6%		InternImage-H	2022-11-10
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	91.3%		MAWS (ViT-2B)	2023-03-23
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition	✓ Link	88.7%		MetaFormer (MetaFormer-2,384,extra_info)	2022-03-05
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles	✓ Link	87.3%		Hiera-H (448px)	2023-06-01
Masked Autoencoders Are Scalable Vision Learners	✓ Link	86.8%		MAE (ViT-H, 448)	2021-11-11
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	86.0%		SWAG (ViT H/14)	2022-01-20
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision	✓ Link	84.7%		SEER (RegNet10B - finetuned - 384px)	2022-02-16
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition	✓ Link	84.3%		MetaFormer (MetaFormer-2,384)	2022-03-05
Omnivore: A Single Model for Many Visual Modalities	✓ Link	84.1%		OMNIVORE (Swin-L)	2022-01-20
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	81.8%	186M	RDNet-L (224 res, IN-1K pretrained)	2024-03-28
Grafit: Learning fine-grained image representations with coarse labels		81.2%		RegNet-8GF	2020-11-25
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition	✓ Link	81.0%		VL-LTR (ViT-B-16)	2021-11-26
A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems	✓ Link	80.97		µ2Net+ (ViT-L/16)	2022-09-15
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	80.5	87M	RDNet-B (224 res, IN-1K pretrained)	2024-03-28
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers	✓ Link	80.3%		MixMIM-L	2022-05-26
Training data-efficient image transformers & distillation through attention	✓ Link	79.5%		DeiT-B	2020-12-23
Incorporating Convolution Designs into Visual Transformers	✓ Link	79.4%		CeiT-S (384 finetune resolution)	2021-03-22
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	79.1	50M	RDNet-S (224 res, IN-1K pretrained)	2024-03-28
Generalized Parametric Contrastive Learning	✓ Link	78.1%		GPaCo (ResNet-152)	2022-09-26
Going deeper with Image Transformers	✓ Link	78%		CaiT-M-36 U 224	2021-03-31
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers	✓ Link	77.5%		MixMIM-B	2022-05-26
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	77.0	24M	RDNet-T (224 res, IN-1K pretrained)	2024-03-28
Generalized Parametric Contrastive Learning	✓ Link	75.4%		GPaCo (ResNet-50)	2022-09-26
Class-Balanced Distillation for Long-Tailed Visual Recognition	✓ Link	75.3%		CBD-ENS (ResNet-101)	2021-04-12
Three things everyone should know about Vision Transformers	✓ Link	75.3%		ViT-L (attn finetune)	2022-03-18
Parametric Contrastive Learning	✓ Link	75.2%		PaCo(ResNet-152)	2021-07-26
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition	✓ Link	74.6%		VL-LTR (ResNet-50)	2021-11-26
The Majority Can Help The Minority: Context-rich Minority Oversampling for Long-tailed Classification	✓ Link	74.0%		BS-CMO (ResNet-50)	2021-12-01
Class-Balanced Distillation for Long-Tailed Visual Recognition	✓ Link	73.6%		CBD-ENS (ResNet-50)	2021-04-12
Incorporating Convolution Designs into Visual Transformers	✓ Link	73.3%		CeiT-S	2021-03-22
Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition	✓ Link	72.9%		TADE (ResNet-50)	2021-07-20
Incorporating Convolution Designs into Visual Transformers	✓ Link	72.2%		CeiT-T (384 finetune resolution)	2021-03-22
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts	✓ Link	72.2%		RIDE (ResNet-50)	2020-10-05
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup	✓ Link	70.54%		ResNeXt-101 (SAMix)	2021-11-30
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers	✓ Link	70.49%		ResNeXt-101 (AutoMix)	2021-03-24
Disentangling Label Distribution for Long-tailed Visual Recognition	✓ Link	70.0%		LADE	2020-12-01
Grafit: Learning fine-grained image representations with coarse labels		69.8%		ResNet-50	2020-11-25
Feature Space Augmentation for Long-Tailed Data		69.08%		ResNet-152	2020-08-09
Class-Balanced Loss Based on Effective Number of Samples	✓ Link	69.05%		ResNet-152	2019-01-16
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition	✓ Link	68.75%		MetaSAug	2021-03-23
Feature Space Augmentation for Long-Tailed Data		68.39%		ResNet-101	2020-08-09
Class-Balanced Loss Based on Effective Number of Samples	✓ Link	67.98%		ResNet-101	2019-01-16
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	66.9%		LeViT-384	2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	66.2%		LeViT-256	2021-04-02
Feature Space Augmentation for Long-Tailed Data		65.91%		ResNet-50	2020-08-09
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup	✓ Link	64.84%		ResNet-50 (SAMix)	2021-11-30
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers	✓ Link	64.73%		ResNet-50 (AutoMix)	2021-03-24
Incorporating Convolution Designs into Visual Transformers	✓ Link	64.3%		CeiT-T	2021-03-22
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	64.3		ResMLP-24	2021-05-07
Class-Balanced Loss Based on Effective Number of Samples	✓ Link	64.16%		ResNet-50	2019-01-16
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	60.4%		LeViT-192	2021-04-02
The iNaturalist Species Classification and Detection Dataset	✓ Link	60.20%		Inception-V3	2017-07-20
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	60.2		ResMLP-12	2021-05-07
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	55.2%		LeViT-128S	2021-04-02
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	54%		LeViT-128	2021-04-02
ClusterFit: Improving Generalization of Visual Representations	✓ Link	49.7%		ResNet-50	2019-12-06
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments	✓ Link	48.6		ResNet-50	2020-06-17
Barlow Twins: Self-Supervised Learning via Redundancy Reduction	✓ Link	46.5		Barlow Twins (ResNet-50)	2021-03-04

OpenCodePapers

image-classification-on-inaturalist-2018