OpenCodePapers

image-classification-on-inaturalist

Image Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top 1 Accuracy	Top 5 Accuracy	Top 3 Error	Overall	ModelName	ReleaseDate
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	85.9				AIMv2-3B (448 res)	2024-11-21
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles	✓ Link	83.8				Hiera-H (448px)	2023-06-01
Masked Autoencoders Are Scalable Vision Learners	✓ Link	83.4				MAE (ViT-H, 448)	2021-11-11
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition	✓ Link	83.4%				MetaFormer (MetaFormer-2,384,extra_info)	2022-03-05
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	81.5				AIMv2-3B	2024-11-21
ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder	✓ Link	81.2				ViT-NeT (SwinV2-B)	2022-07-17
MetaFormer: A Unified Meta Framework for Fine-Grained Recognition	✓ Link	80.4%				MetaFormer (MetaFormer-2,384)	2022-03-05
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	79.7				AIMv2-1B	2024-11-21
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	77.9				AIMv2-H	2024-11-21
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	76				AIMv2-L	2024-11-21
Fixing the train-test resolution discrepancy	✓ Link	75.4				FixSENet-154	2019-06-14
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition	✓ Link	72.3				SEB+EfficientNet-B5	2022-05-26
TransFG: A Transformer Architecture for Fine-grained Recognition	✓ Link	71.7				TransFG	2021-03-14
The iNaturalist Species Classification and Detection Dataset	✓ Link	67.3%	87.5%			IncResNetV2 SE	2017-07-20
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	63.6%	84.8%			SpineNet-143	2019-12-10
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition	✓ Link	63.28%				MetaSAug	2021-03-23
Graph-RISE: Graph-Regularized Image Semantic Embedding	✓ Link	31.12%	52.76%			Graph-RISE (40M)	2019-02-14
Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization	✓ Link			14.625		iSQRT-COV-Net	2019-04-15
DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets	✓ Link				75.1	b_22DeiT-LT(ours)	2024-04-03