image-classification-on-imagenet

Image Classification

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Top 1 Accuracy	Number of params	GFLOPs	Hardware Burden	Top 5 Accuracy	Operations per network pass	ModelName	ReleaseDate
CoCa: Contrastive Captioners are Image-Text Foundation Models	✓ Link	91.0%	2100M					CoCa (finetuned)	2022-05-04
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	✓ Link	90.98%	2440M					Model soups (BASIC-L)	2022-03-10
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time	✓ Link	90.94%	1843M					Model soups (ViT-G/14)	2022-03-10
DaViT: Dual Attention Vision Transformers	✓ Link	90.4%	1437M	1038				DaViT-G	2022-04-07
DaViT: Dual Attention Vision Transformers	✓ Link	90.2%	362M	334				DaViT-H	2022-04-07
Meta Pseudo Labels	✓ Link	90.2%	480M		95040G	98.8		Meta Pseudo Labels (EfficientNet-L2)	2020-03-23
Swin Transformer V2: Scaling Up Capacity and Resolution	✓ Link	90.17%	3000M					SwinV2-G	2021-11-18
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	90.1%	6500M					MAWS (ViT-6.5B)	2023-03-23
Florence: A New Foundation Model for Computer Vision	✓ Link	90.05%	893M			99.02		Florence-CoSwin-H	2021-11-22
Meta Pseudo Labels	✓ Link	90%	390M					Meta Pseudo Labels (EfficientNet-B6-Wide)	2020-03-23
Reversible Column Networks	✓ Link	90.0%	2158M					RevCol-H	2022-12-22
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	89.8%	2000M					MAWS (ViT-2B)	2023-03-23
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	✓ Link	89.7%	1000M					EVA	2022-11-14
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information	✓ Link	89.6%						M3I Pre-training (InternImage-H)	2022-11-17
Scaling Vision Transformers to 22 Billion Parameters	✓ Link	89.6%	307M					ViT-L/16 (384res, distilled from ViT-22B)	2023-02-10
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	89.6%	1080M	1478				InternImage-H	2022-11-10
MaxViT: Multi-Axis Vision Transformer	✓ Link	89.53%						MaxViT-XL (512res, JFT)	2022-04-04
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	89.5%						AIMv2-3B (448 res)	2024-11-21
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	89.5%	650M					MAWS (ViT-H)	2023-03-23
MaxViT: Multi-Axis Vision Transformer	✓ Link	89.41%						MaxViT-L (512res, JFT)	2022-04-04
MaxViT: Multi-Axis Vision Transformer	✓ Link	89.36%						MaxViT-XL (384res, JFT)	2022-04-04
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning		89.3%						OmniVec2	2024-01-01
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	89.2%	527M	367				NFNet-F4+	2021-02-11
MaxViT: Multi-Axis Vision Transformer	✓ Link	89.12%						MaxViT-L (384res, JFT)	2022-04-04
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	89.1%	483.2M	648.5				MOAT-4 22K+1K	2022-10-04
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation	✓ Link	89.0%	307M					FD (CLIP ViT-L-336)	2022-05-27
Differentially Private Image Classification from Features	✓ Link	88.9%						Last Layer Tuning with Newton Step (ViT-G/14))	2022-11-24
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?	✓ Link	88.87%	460M					TokenLearner L/8 (24+11)	2021-06-21
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.82%						MaxViT-B (512res, JFT)	2022-04-04
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	88.8%						MAWS (ViT-L)	2023-03-23
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	88.8%	667M	763.5				MViTv2-H (512 res, ImageNet-21k pretrain)	2021-12-02
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.7%						MaxViT-XL (512res, 21K)	2022-04-04
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.69%						MaxViT-B (384res, JFT)	2022-04-04
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision	✓ Link	88.64%	480M					ALIGN (EfficientNet-L2)	2021-02-11
Sharpness-Aware Minimization for Efficiently Improving Generalization	✓ Link	88.61%	480M					EfficientNet-L2-475 (SAM)	2020-10-03
Scaling Vision Transformers to 22 Billion Parameters	✓ Link	88.6%	86M					ViT-B/16	2023-02-10
BEiT: BERT Pre-Training of Image Transformers	✓ Link	88.60%	331M					BEiT-L (ViT; ImageNet-22K pretrain)	2021-06-15
Revisiting Weakly Supervised Pre-Training of Visual Perception Models	✓ Link	88.6%	633.5M	1018.8				SWAG (ViT H/14)	2022-01-20
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	✓ Link	88.55%						ViT-H/14	2020-10-22
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	88.52%		114				CoAtNet-3 @384	2021-06-09
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.51%						MaxViT-XL (384res, 21K)	2022-04-04
Reproducible scaling laws for contrastive language-image learning	✓ Link	88.5%						OpenCLIP ViT-H/14	2022-12-14
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	88.5%						AIMv2-3B	2024-11-21
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	88.5%	480M	585				FixEfficientNet-L2	2020-03-18
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond	✓ Link	88.5%	644M					ViTAE-H + MAE (448)	2022-02-21
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.46%						MaxViT-L (512res, 21K)	2022-04-04
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	88.4%	218M	140.7				MViTv2-L (384 res, ImageNet-21k pretrain)	2021-12-02
Self-training with Noisy Student improves ImageNet classification	✓ Link	88.4%	480M		51800G			NoisyStudent (EfficientNet-L2)	2019-11-11
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.38%						MaxViT-B (512res, 21K)	2022-04-04
Differentiable Top-k Classification Learning	✓ Link	88.37%						Top-k DiffSortNets (EfficientNet-L2)	2022-06-15
A ConvNet for the 2020s	✓ Link	88.36%	1827M					Adlik-ViT-SG+Swin_large+Convnext_xlarge(384)	2022-01-10
Scaling Vision with Sparse Mixture of Experts	✓ Link	88.36%	7200M					V-MoE-H/14 (Every-2)	2021-06-10
MaxViT: Multi-Axis Vision Transformer	✓ Link	88.32%						MaxViT-L (384res, 21K)	2022-04-04
Unicom: Universal and Compact Representation Learning for Image Retrieval	✓ Link	88.3						Unicom (ViT-L/14@336px) (Finetuned)	2023-04-12
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers	✓ Link	88.3%						PeCo (ViT-H, 448)	2021-11-24
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion	✓ Link	88.21%						DFN-5B H/14-378 + PrefixedIter Decoder	2024-07-15
Exploring Target Representations for Masked Autoencoders	✓ Link	88.2%						dBOT ViT-H (CLIP as Teacher)	2022-09-08
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link	88.1%		489.1				MambaVision-L3	2024-07-10
MetaFormer Baselines for Vision	✓ Link	88.1%	99M	72.2				CAFormer-B36 (384 res, 21K)	2022-10-24
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	88.1%	1200M					AIMv2-1B	2024-11-21
Scaling Vision with Sparse Mixture of Experts	✓ Link	88.08%	656M					VIT-H/14	2021-06-10
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	88.0%						ViT-H@224 (cosub)	2022-12-09
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	88%						UniRepLKNet-XL++	2023-11-27
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	88%	335M	163				InternImage-XL	2022-11-10
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	88%	667M	120.6				MViTv2-H (mageNet-21k pretrain)	2021-12-02
MLP-Mixer: An all-MLP Architecture for Vision	✓ Link	87.94%						Mixer-H/14 (JFT-300M pre-train)	2021-05-04
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	87.9%						UniRepLKNet-L++	2023-11-27
Exploring Target Representations for Masked Autoencoders	✓ Link	87.8%						dBOT ViT-L (CLIP as Teacher)	2022-09-08
MogaNet: Multi-order Gated Aggregation Network	✓ Link	87.8%	181M	102				MogaNet-XL (384res)	2022-11-07
Visual Attention Network	✓ Link	87.8%	200M	114.3				VAN-B6 (22K, 384res)	2022-02-20
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs	✓ Link	87.8%	335M	128.7				RepLKNet-XL	2022-03-13
A ConvNet for the 2020s	✓ Link	87.8%	350M	179				ConvNeXt-XL (ImageNet-22k)	2022-01-10
Masked Autoencoders Are Scalable Vision Learners	✓ Link	87.8%	656M					MAE (ViT-H, 448)	2021-11-11
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	✓ Link	87.76%						ViT-L/16	2020-10-22
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions	✓ Link	87.7%		101.8				HorNet-L (GF)	2022-07-28
CvT: Introducing Convolutions to Vision Transformers	✓ Link	87.7%						CvT-W24 (384 res, ImageNet-22k pretrain)	2021-03-29
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	87.7%	223M	108				InternImage-L	2022-11-10
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	87.6%						CoAtNet-3 (21k)	2021-06-09
MetaFormer Baselines for Vision	✓ Link	87.6%	100M	66.5				ConvFormer-B36 (384 res, 21K)	2022-10-24
Big Transfer (BiT): General Visual Representation Learning	✓ Link	87.54%				98.46		BiT-L (ResNet)	2019-12-24
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers	✓ Link	87.5%						PeCo (ViT-H, 224)	2021-11-24
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	87.5%						ViT-L@224 (cosub)	2022-12-09
MetaFormer Baselines for Vision	✓ Link	87.5%	56M	42				CAFormer-M36 (384 res, 21K)	2022-10-24
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows	✓ Link	87.5%	173M	96.8				CSWin-L (384 res,ImageNet-22k pretrain)	2021-07-01
DaViT: Dual Attention Vision Transformers	✓ Link	87.5%	196.8M	103				DaViT-L (ImageNet-22k)	2022-04-07
Dilated Neighborhood Attention Transformer	✓ Link	87.5%	200M	92.4				DiNAT-Large (11x11ks; 384res; Pretrained on IN22K@224)	2022-09-29
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	87.5%	600M					AIMv2-H	2024-11-21
Scaling Vision with Sparse Mixture of Experts	✓ Link	87.41%	3400M					V-MoE-L/16 (Every-2)	2021-06-10
Dilated Neighborhood Attention Transformer	✓ Link	87.4%		89.7				DiNAT-Large (384x384; Pretrained on ImageNet-22K @ 224x224)	2022-09-29
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language	✓ Link	87.4%						data2vec 2.0	2022-12-14
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	87.4%						UniRepLKNet-B++	2023-11-27
HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space	✓ Link	87.4%						HVT Huge	2024-09-25
MetaFormer Baselines for Vision	✓ Link	87.4%	99M	23.2				CAFormer-B36 (224 res, 21K)	2022-10-24
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP	✓ Link	87.4%	117M	51				UniNet-B6	2022-07-12
Dilated Neighborhood Attention Transformer	✓ Link	87.4%	197M	101.5				DiNAT_s-Large (384res; Pretrained on IN22K@224)	2022-09-29
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	87.3%	197M	103.9				Swin-L	2021-03-25
EfficientNetV2: Smaller Models and Faster Training	✓ Link	87.3%	208M	94				EfficientNetV2-XL (21k)	2021-04-01
Improving Vision Transformers by Revisiting High-frequency Components	✓ Link	87.3%	295.5M	412				VOLO-D5+HAT	2022-04-03
PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions	✓ Link	87.2%						EfficientNetV2 (PolyLoss)	2022-04-26
ELSA: Enhanced Local Self-Attention for Vision Transformer	✓ Link	87.2%	298M	437				ELSA-VOLO-D5 (512*512)	2021-12-23
A Study on Transformer Configuration and Training Objective		87.1						Bamboo (Bamboo-H)	2022-05-21
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	87.1%						Swin-L@224 (cosub)	2022-12-09
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	87.1%						CoAtNet-2 (21k)	2021-06-09
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	87.1%	66M	82				FixEfficientNet-B7	2020-03-18
Understanding The Robustness in Vision Transformers	✓ Link	87.1%	76.8M					FAN-L-Hybrid++	2022-04-26
Swin Transformer V2: Scaling Up Capacity and Resolution	✓ Link	87.1%	88M					SwinV2-B	2021-11-18
VOLO: Vision Outlooker for Visual Recognition	✓ Link	87.1%	296M	412				VOLO-D5	2021-06-24
Augmenting Convolutional networks with attention-based aggregation	✓ Link	87.1%	334.3M					PatchConvNet-L120-21k-384	2021-12-27
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?	✓ Link	87.07%						16-TokenLearner B/16 (21)	2021-06-21
Enhance the Visual Representation via Discrete Adversarial Training	✓ Link	87.02%						MAE+DAT (ViT-H)	2022-09-16
Visual Attention Network	✓ Link	87%		50.6				VAN-B5 (22K, 384res)	2022-02-20
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP	✓ Link	87%	72.9M	20.4				UniNet-B5	2022-07-12
MetaFormer Baselines for Vision	✓ Link	87.0%	100M	22.6				ConvFormer-B36 (224 res, 21K)	2022-10-24
Masked Autoencoders Are Scalable Vision Learners	✓ Link	86.9%						MAE (ViT-H)	2021-11-11
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles	✓ Link	86.9%						Hiera-H	2023-06-01
MetaFormer Baselines for Vision	✓ Link	86.9%	39M	26.0				CAFormer-S36 (384 res, 21K)	2022-10-24
MetaFormer Baselines for Vision	✓ Link	86.9%	57M	37.7				ConvFormer-M36 (384 res, 21K)	2022-10-24
Self-training with Noisy Student improves ImageNet classification	✓ Link	86.9%	66M	37				NoisyStudent (EfficientNet-B7)	2019-11-11
DaViT: Dual Attention Vision Transformers	✓ Link	86.9%	87.9M	46.4				DaViT-B (ImageNet-22k)	2022-04-07
Visual Attention Network	✓ Link	86.9%	200M	38.9				VAN-B6 (22K)	2022-02-20
The effectiveness of MAE pre-pretraining for billion-scale pretraining	✓ Link	86.8%						MAWS (ViT-B)	2023-03-23
EfficientNetV2: Smaller Models and Faster Training	✓ Link	86.8%	120M	53				EfficientNetV2-L (21k)	2021-04-01
VOLO: Vision Outlooker for Visual Recognition	✓ Link	86.8%	193M	197				VOLO-D4	2021-06-24
Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error		86.78%	377.2M					NFNet-F5 w/ SAM w/ augmult=16	2021-05-27
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems	✓ Link	86.74%						µ2Net (ViT-L/16)	2022-05-25
DeiT III: Revenge of the ViT	✓ Link	86.7%						ViT-B @384 (DeiT III, 21k)	2022-04-14
MaxViT: Multi-Axis Vision Transformer	✓ Link	86.7%						MaxViT-B (512res)	2022-04-04
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	86.7%	43M					FixEfficientNet-B6	2020-03-18
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	86.7%	190M	271				MOAT-3 1K only	2022-10-04
An Algorithm for Routing Vectors in Sequences	✓ Link	86.7%	312.8M					Heinsen Routing + BEiT-large 16 224	2022-11-20
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network	✓ Link	86.61%		51.93				CLCNet (S:ViT+D:EffNet-B7) (retrain)	2022-05-19
MetaFormer Baselines for Vision	✓ Link	86.6%	56M	13.2				CAFormer-M36 (224 res, 21K)	2022-10-24
Visual Attention Network	✓ Link	86.6%	60M	35.9				VAN-B4 (22K, 384res)	2022-02-20
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link	86.6%	300M					AIMv2-L	2024-11-21
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language	✓ Link	86.6%	656M					data2vec (ViT-H)	2022-02-07
Dilated Neighborhood Attention Transformer	✓ Link	86.5%		34.5				DiNAT_s-Large (224x224; Pretrained on ImageNet-22K @ 224x224)	2022-09-29
Meta Knowledge Distillation		86.5%						MKD ViT-L	2022-02-16
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	86.5%	21M	27.0				TinyViT-21M-512-distill (512 res, 21k)	2022-07-21
Augmenting Convolutional networks with attention-based aggregation	✓ Link	86.5%	99.4M					PatchConvNet-B60-21k-384	2021-12-27
Going deeper with Image Transformers	✓ Link	86.5%	438M	377.3				CaiT-M-48-448	2021-03-31
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	86.5%	438.4M	377.28				NFNet-F6 w/ SAM	2021-02-11
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network	✓ Link	86.46%		57.46				CLCNet (S:ViT+D:VOLO-D3) (retrain)	2022-05-19
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network	✓ Link	86.42%		45.43				CLCNet (S:ConvNeXt-L+D:EffNet-B7) (retrain)	2022-05-19
MaxViT: Multi-Axis Vision Transformer	✓ Link	86.4%						MaxViT-L (384res)	2022-04-04
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	86.4%						UniRepLKNet-S++	2023-11-27
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	86.4%	30M					FixEfficientNet-B5	2020-03-18
MetaFormer Baselines for Vision	✓ Link	86.4%	40M	22.4				ConvFormer-S36 (384 res, 21K)	2022-10-24
Self-training with Noisy Student improves ImageNet classification	✓ Link	86.4%	43M					NoisyStudent (EfficientNet-B6)	2019-11-11
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	86.4%	88M	47				Swin-B	2021-03-25
MetaFormer Baselines for Vision	✓ Link	86.4%	99M	72.2				CAFormer-B36 (384 res)	2022-10-24
All Tokens Matter: Token Labeling for Training Better Vision Transformers	✓ Link	86.4%	151M	214.8				LV-ViT-L	2021-04-22
Fixing the train-test resolution discrepancy	✓ Link	86.4%	829M		62G	98.0%		FixResNeXt-101 32x48d	2019-06-14
MaxViT: Multi-Axis Vision Transformer	✓ Link	86.34%						MaxViT-B (384res)	2022-04-04
A Study on Transformer Configuration and Training Objective		86.3						Bamboo (Bamboo-L)	2022-05-21
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	86.3%						ViT-B@224 (cosub)	2022-12-09
SP-ViT: Learning 2D Spatial Priors for Vision Transformers	✓ Link	86.3%						Our SP-ViT-L\|384	2022-06-15
VOLO: Vision Outlooker for Visual Recognition	✓ Link	86.3%	86M	67.9				VOLO-D3	2021-06-24
BEiT: BERT Pre-Training of Image Transformers	✓ Link	86.3%	86M					BEiT-L (ViT; ImageNet 1k pretrain)	2021-06-15
Visual Attention Network	✓ Link	86.3%	90M	17.2				VAN-B5 (22K)	2022-02-20
UniFormer: Unifying Convolution and Self-attention for Visual Recognition	✓ Link	86.3%	100M	39.2				UniFormer-L (384 res)	2022-01-24
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	86.3%	218M	140.2				MViTv2-L (384 res)	2021-12-02
Going deeper with Image Transformers	✓ Link	86.3%	271M	247.8				CAIT-M36-448	2021-03-31
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	86.3%	377.2M	289.76				NFNet-F5 w/ SAM	2021-02-11
Tiny Models are the Computational Saver for Large Models	✓ Link	86.24		31.17				TinySaver(ConvNeXtV2_h, 0.01 Acc drop)	2024-03-26
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	86.2%						Swin-B@224 (cosub)	2022-12-09
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	86.2%	21M	13.8				TinyViT-21M-384-distill (384 res, 21k)	2022-07-21
EfficientNetV2: Smaller Models and Faster Training	✓ Link	86.2%	54M	24				EfficientNetV2-M (21k)	2021-04-01
MetaFormer Baselines for Vision	✓ Link	86.2%	56M	42.0				CAFormer-M36 (384 res)	2022-10-24
TransNeXt: Robust Foveal Visual Perception for Vision Transformers	✓ Link	86.2%	89.7M	56.3				TransNeXt-Base (IN-1K supervised, 384)	2023-11-28
Masked Image Residual Learning for Scaling Deeper Vision Transformers	✓ Link	86.2%	341M	67.0				MIRL (ViT-B-48)	2023-09-25
MaxViT: Multi-Axis Vision Transformer	✓ Link	86.19%						MaxViT-S (512res)	2022-04-04
Self-training with Noisy Student improves ImageNet classification	✓ Link	86.1%	30M					NoisyStudent (EfficientNet-B5)	2019-11-11
MetaFormer Baselines for Vision	✓ Link	86.1%	57M	12.8				ConvFormer-M36 (224 res, 21K)	2022-10-24
Going deeper with Image Transformers	✓ Link	86.1%	270.9M	173.3				CAIT-M-36	2021-03-31
Refiner: Refining Self-attention for Vision Transformers	✓ Link	86.03	81M					Refiner-ViT-L	2021-06-07
Generalized Parametric Contrastive Learning	✓ Link	86.01%						GPaCo (ViT-L)	2022-09-26
Omnivore: A Single Model for Many Visual Modalities	✓ Link	86.0%						Omnivore (Swin-L)	2022-01-20
SP-ViT: Learning 2D Spatial Priors for Vision Transformers	✓ Link	86%						Our SP-ViT-M\|384	2022-06-15
TransNeXt: Robust Foveal Visual Perception for Vision Transformers	✓ Link	86.0%	49.7M	32.1				TransNeXt-Small (IN-1K supervised, 384)	2023-11-28
VOLO: Vision Outlooker for Visual Recognition	✓ Link	86%	59M					VOLO-D2	2021-06-24
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	86%	64M	20				EfficientViT-L2 (r384)	2022-05-29
XCiT: Cross-Covariance Image Transformers	✓ Link	86%	189M	417.9				XCiT-L24	2021-06-17
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling	✓ Link	86.0%	198M					SparK (ConvNeXt-Large, 384)	2023-01-09
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	86.0%	377.2M	289.76				NFNet-F5	2021-02-11
Masked Autoencoders Are Scalable Vision Learners	✓ Link	85.9%						MAE (ViT-L)	2021-11-11
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	85.9%	19M					FixEfficientNet-B4	2020-03-18
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention	✓ Link	85.9%	94M	49.7				DAT-B++ (384x384)	2023-09-04
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	85.9%	316.1M	215.24				NFNet-F4	2021-02-11
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	85.8%						ConvNeXt-B@224 (cosub)	2022-12-09
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	85.8%						PiT-B@224 (cosub)	2022-12-09
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	85.8%						GTP-ViT-B-Patch8/P20	2023-11-06
MetaFormer Baselines for Vision	✓ Link	85.8%	39M	8.0				CAFormer-S36 (224 res, 21K)	2022-10-24
XCiT: Cross-Covariance Image Transformers	✓ Link	85.8%	84M	188				XCiT-M24	2021-06-17
MaxUp: A Simple Way to Improve Generalization of Neural Network Training	✓ Link	85.8%	87.42M					Fix-EfficientNet-B8 (MaxUp + CutMix)	2020-02-20
Circumventing Outliers of AutoAugment with Knowledge Distillation	✓ Link	85.8%	88M					KDforAA (EfficientNet-B8)	2020-03-25
Going deeper with Image Transformers	✓ Link	85.8%	185.9M	116.1				CAIT-M-24	2021-03-31
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	85.8%	186M	34.7				RDNet-L (384 res)	2024-03-28
DeiT III: Revenge of the ViT	✓ Link	85.8%	304.8M	191.2				ViT-L	2022-04-14
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	85.8%	1360M	142				FasterViT-6	2023-06-09
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision	✓ Link	85.8%	10000M					SEER (RG-10B)	2022-02-16
Tiny Models are the Computational Saver for Large Models	✓ Link	85.75		19.41				TinySaver(ConvNeXtV2_h, 0.5 Acc drop)	2024-03-26
Tiny Models are the Computational Saver for Large Models	✓ Link	85.74						TinySaver(Swin_large, 0.5 Acc drop)	2024-03-26
MaxViT: Multi-Axis Vision Transformer	✓ Link	85.72%						MaxViT-T (384res)	2022-04-04
Visual Attention Network	✓ Link	85.7%		12.2				VAN-B4 (22K)	2022-02-20
EfficientNetV2: Smaller Models and Faster Training	✓ Link	85.7%		53				EfficientNetV2-L	2021-04-01
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	85.7%						FixEfficientNet-B8	2020-03-18
DeiT III: Revenge of the ViT	✓ Link	85.7%						ViT-B @224 (DeiT III, 21k)	2022-04-14
Exploring Target Representations for Masked Autoencoders	✓ Link	85.7%						dBOT ViT-B (CLIP as Teacher)	2022-09-08
MetaFormer Baselines for Vision	✓ Link	85.7%	39M	26.0				CAFormer-S36 (384 res)	2022-10-24
MetaFormer Baselines for Vision	✓ Link	85.7%	100M	66.5				ConvFormer-B36 (384 res)	2022-10-24
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	85.7%	254.9M	114.76				NFNet-F3	2021-02-11
Masking meets Supervision: A Strong Learning Alliance	✓ Link	85.7%	632M					ViT-H @224 (DeiT-III + AugSub)	2023-06-20
XCiT: Cross-Covariance Image Transformers	✓ Link	85.6%	48M	106				XCiT-S24	2021-06-17
MetaFormer Baselines for Vision	✓ Link	85.6%	57M	37.7				ConvFormer-M36 (384 res)	2022-10-24
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	85.6%	64M	11				EfficientViT-L2 (r288)	2022-05-29
UniFormer: Unifying Convolution and Self-attention for Visual Recognition	✓ Link	85.6%	100M	12.6				UniFormer-L	2022-01-24
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	85.6%	957.5M	113				FasterViT-5	2023-06-09
Three things everyone should know about Vision Transformers	✓ Link	85.5%						ViT-L@384 (attn finetune)	2022-03-18
SP-ViT: Learning 2D Spatial Priors for Vision Transformers	✓ Link	85.5%						Our SP-ViT-L	2022-06-15
MiniViT: Compressing Vision Transformers with Weight Multiplexing	✓ Link	85.5%	47M	98.8				Mini-Swin-B@384	2022-04-14
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning	✓ Link	85.5%	57.5M	14.8				Wave-ViT-L	2022-07-11
Circumventing Outliers of AutoAugment with Knowledge Distillation	✓ Link	85.5%	66M					KDforAA (EfficientNet-B7)	2020-03-25
Scaling Local Self-Attention for Parameter Efficient Visual Backbones	✓ Link	85.5%	87M					HaloNet4 (base 128, Conv-12)	2021-03-23
Adversarial Examples Improve Image Recognition	✓ Link	85.5%	88M					AdvProp (EfficientNet-B8)	2019-11-21
MetaFormer Baselines for Vision	✓ Link	85.5%	99M	23.2				CAFormer-B36 (224 res)	2022-10-24
A ConvNet for the 2020s	✓ Link	85.5%	198M	101				ConvNeXt-L (384 res)	2022-01-10
RandAugment: Practical automated data augmentation with a reduced search space	✓ Link	85.4%						EfficientNet-B8 (RandAugment)	2019-09-30
BiFormer: Vision Transformer with Bi-Level Routing Attention	✓ Link	85.4%						BiFormer-B* (IN1k ptretrain)	2023-03-15
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	85.4%						GTP-EVA-L/P8	2023-11-06
Augmenting Convolutional networks with attention-based aggregation	✓ Link	85.4%	25.2M					PatchConvNet-S60-21k-512	2021-12-27
MetaFormer Baselines for Vision	✓ Link	85.4%	26M	13.4				CAFormer-S18 (384 res, 21K)	2022-10-24
MetaFormer Baselines for Vision	✓ Link	85.4%	40M	7.6				ConvFormer-S36 (224 res, 21K)	2022-10-24
MetaFormer Baselines for Vision	✓ Link	85.4%	40M	22.4				ConvFormer-S36 (384 res)	2022-10-24
Going deeper with Image Transformers	✓ Link	85.4%	68.2M	48				CAIT-S-36	2021-03-31
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	85.4%	424.6M	36.6				FasterViT-4	2023-06-09
Exploring the Limits of Weakly Supervised Pretraining	✓ Link	85.4%	829M	306				ResNeXt-101 32x48d	2018-05-02
Big Transfer (BiT): General Visual Representation Learning	✓ Link	85.39%	928M					BiT-M (ResNet)	2019-12-24
MLP-Mixer: An all-MLP Architecture for Vision	✓ Link	85.3%						ViT-L/16 Dosovitskiy et al. (2021)	2021-05-04
Omnivore: A Single Model for Many Visual Modalities	✓ Link	85.3%						Omnivore (Swin-B)	2022-01-20
Self-training with Noisy Student improves ImageNet classification	✓ Link	85.3%	19M					NoisyStudent (EfficientNet-B4)	2019-11-11
Going deeper with Image Transformers	✓ Link	85.3%	89.5M	63.8				CAIT-S-48	2021-03-31
Masking meets Supervision: A Strong Learning Alliance	✓ Link	85.3%	304M					ViT-L @224 (DeiT-III + AugSub)	2023-06-20
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network	✓ Link	85.28%		47.43				CLCNet (S:D1+D:D5)	2022-05-19
Tiny Models are the Computational Saver for Large Models	✓ Link	85.24						TinySaver(Swin_large, 1.0 Acc drop)	2024-03-26
DeiT III: Revenge of the ViT	✓ Link	85.2%						ViT-H @224 (DeiT III)	2022-04-14
HyenaPixel: Global Image Context with Convolutions	✓ Link	85.2%						HyenaPixel-Bidirectional-Former-B36	2024-02-29
VOLO: Vision Outlooker for Visual Recognition	✓ Link	85.2%	27M					VOLO-D1	2021-06-24
MetaFormer Baselines for Vision	✓ Link	85.2%	56M	13.2				CAFormer-M36 (224 res)	2022-10-24
Adversarial Examples Improve Image Recognition	✓ Link	85.2%	66M					AdvProp (EfficientNet-B7)	2019-11-21
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP		85.2%	73.5M	23.2				UniNet-B5	2021-10-08
Training data-efficient image transformers & distillation through attention	✓ Link	85.2%	87M					DeiT-B 384	2020-12-23
MaxViT: Multi-Axis Vision Transformer	✓ Link	85.17%	212M	43.9				MaxViT-L (224res)	2022-04-04
EfficientNetV2: Smaller Models and Faster Training	✓ Link	85.1%						EfficientNetV2-M	2021-04-01
Meta Knowledge Distillation		85.1%						MKD ViT-B	2022-02-16
SP-ViT: Learning 2D Spatial Priors for Vision Transformers	✓ Link	85.1%						SP-ViT-S\|384	2022-06-15
XCiT: Cross-Covariance Image Transformers	✓ Link	85.1%	26M	55.6				XCiT-S12	2021-06-17
Going deeper with Image Transformers	✓ Link	85.1%	46.9M	32.2				CAIT-S-24	2021-03-31
Semi-Supervised Recognition under a Noisy and Fine-grained Dataset	✓ Link	85.1%	76M					ResNet200_vd_26w_4s_ssld	2020-06-18
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers	✓ Link	85.1%	88M	16.3				MixMIM-B	2022-05-26
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	85.1%	193.8M	62.59				NFNet-F2	2021-02-11
Exploring the Limits of Weakly Supervised Pretraining	✓ Link	85.1%	466M	174				ResNeXt-101 32x32d	2018-05-02
Discrete Representations Strengthen Vision Transformer Robustness	✓ Link	85.07%						DiscreteViT	2021-11-20
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	85.0%						ViT-M@224 (cosub)	2022-12-09
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders	✓ Link	85%						ViC-MAE (ViT-L)	2023-03-21
HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space	✓ Link	85%						HVT Large	2024-09-25
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	85%	12M					FixEfficientNet-B3	2020-03-18
MetaFormer Baselines for Vision	✓ Link	85.0%	26M	13.4				CAFormer-S18 (384 res)	2022-10-24
MetaFormer Baselines for Vision	✓ Link	85.0%	27M	11.6				ConvFormer-S18 (384 res, 21K)	2022-10-24
RandAugment: Practical automated data augmentation with a reduced search space	✓ Link	85%	66M					EfficientNet-B7 (RandAugment)	2019-09-30
DeiT III: Revenge of the ViT	✓ Link	85.0%	87M					ViT-B @384 (DeiT III)	2022-04-14
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link	85%	227.9M	34.9				MambaVision-L	2024-07-10
MaxViT: Multi-Axis Vision Transformer	✓ Link	84.94%	120M	23.4				MaxViT-B (224res)	2022-04-04
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	84.91%						CaiT-S24	2023-08-18
DeiT III: Revenge of the ViT	✓ Link	84.9%						ViT-L @224 (DeiT III)	2022-04-14
SP-ViT: Learning 2D Spatial Priors for Vision Transformers	✓ Link	84.9%						Our SP-ViT-M	2022-06-15
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	84.9%						FastViT-MA36	2023-03-24
HyenaPixel: Global Image Context with Convolutions	✓ Link	84.9%						HyenaPixel-Former-B36	2024-02-29
EfficientNetV2: Smaller Models and Faster Training	✓ Link	84.9%	22M	8.8				EfficientNetV2-S (21k)	2021-04-01
CvT: Introducing Convolutions to Vision Transformers	✓ Link	84.9%	32M	25				CvT-21 (384 res, ImageNet-22k pretrain)	2021-03-29
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention	✓ Link	84.9%	93M	16.6				DAT-B++ (224x224)	2023-09-04
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	84.9%	97M	16				InternImage-B	2022-11-10
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	84.9%	159.5M	18.2				FasterViT-3	2023-06-09
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	84.8%	21M	4.3				TinyViT-21M-distill (21k)	2022-07-21
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning	✓ Link	84.8%	33.5M	7.2				Wave-ViT-B	2022-07-11
Going deeper with Image Transformers	✓ Link	84.8%	38.6M	28.8				CAIT-XS-36	2021-03-31
Sliced Recursive Transformer	✓ Link	84.8%	71.2M					SReT-B (384 res, ImageNet-1K only)	2021-11-09
Multiscale Vision Transformers	✓ Link	84.8%	72.9M	32.7				MViT-B-24	2021-04-22
Active Token Mixer	✓ Link	84.8%	76.4M	36.4				ActiveMLP-L	2022-03-11
Vision Transformer with Deformable Attention	✓ Link	84.8%	88M	49.8				DAT-B (384 res, IN-1K only)	2022-01-03
Masked Image Residual Learning for Scaling Deeper Vision Transformers	✓ Link	84.8%	96M	18.8				MIRL(ViT-S-54)	2023-09-25
MetaFormer Baselines for Vision	✓ Link	84.8%	100M	22.6				ConvFormer-B36 (224 res)	2022-10-24
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	84.8%	186M	34.7				RDNet-L	2024-03-28
Billion-scale semi-supervised learning for image classification	✓ Link	84.8%	193M					ResNeXt-101 32x16d (semi-weakly sup.)	2019-05-02
ELSA: Enhanced Local Self-Attention for Vision Transformer	✓ Link	84.7%	27M	8				ELSA-VOLO-D1	2021-12-23
TransNeXt: Robust Foveal Visual Perception for Vision Transformers	✓ Link	84.7%	49.7M	10.3				TransNeXt-Small (IN-1K supervised, 224)	2023-11-28
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios	✓ Link	84.7%	57.8M	32				Next-ViT-L @384	2022-07-12
Vicinity Vision Transformer	✓ Link	84.7%	61.8M	31.8				VVT-L (384 res)	2022-06-21
Bottleneck Transformers for Visual Recognition	✓ Link	84.7%	75.1M					BoTNet T7	2021-01-27
MogaNet: Multi-order Gated Aggregation Network	✓ Link	84.7%	83M	15.9				MogaNet-L	2022-11-07
Fast Vision Transformers with HiLo Attention	✓ Link	84.7%	87M	39.7				LITv2-B\|384	2022-05-26
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	84.7%	132.6M	35.54				NFNet-F1	2021-02-11
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention	✓ Link	84.6%	53M	9.4				DAT-S++	2023-09-04
Sequencer: Deep LSTM for Image Classification	✓ Link	84.6%	54M	50.7				Sequencer2D-L↑392	2022-05-04
Contextual Transformer Networks for Visual Recognition	✓ Link	84.6%	55.8M	26.5				SE-CoTNetD-152	2021-07-26
Asymmetric Masked Distillation for Pre-Training Small Foundation Models		84.6%	87M					AMD(ViT-B/16)	2023-11-06
DaViT: Dual Attention Vision Transformers	✓ Link	84.6%	87.9M	15.5				DaViT-B	2022-04-07
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	84.5%						FastViT-SA36	2023-03-24
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	84.5%	34.8M					ReXNet-R_3.0	2020-07-02
MetaFormer Baselines for Vision	✓ Link	84.5%	39M	8.0				CAFormer-S36 (224 res)	2022-10-24
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	84.5%	53M	5.3				EfficientViT-L1 (r224)	2022-05-29
MetaFormer Baselines for Vision	✓ Link	84.5%	57M	12.8				ConvFormer-M36 (224 res)	2022-10-24
Global Context Vision Transformers	✓ Link	84.5%	90M	14.8				GC ViT-B	2022-06-20
ResNeSt: Split-Attention Networks	✓ Link	84.5%	111M					ResNeSt-269	2020-04-19
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	84.5%	168M	34.7				CoAtNet-3	2021-06-09
MaxViT: Multi-Axis Vision Transformer	✓ Link	84.45%	69M	11.7				MaxViT-S (224res)	2022-04-04
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism	✓ Link	84.4%						GPIPE	2018-11-16
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link	84.4%						DeBiFormer-B	2024-10-11
MetaFormer Baselines for Vision	✓ Link	84.4%	27M	11.6				ConvFormer-S18 (384 res)	2022-10-24
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	84.4%	66M	37				EfficientNet-B7	2019-05-28
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	84.4%	87M	15.4				RDNet-B	2024-03-28
Dilated Neighborhood Attention Transformer	✓ Link	84.4%	90M	13.7				DiNAT-Base	2022-09-29
Revisiting ResNets: Improved Training and Scaling Strategies	✓ Link	84.4%	192M	4.6				ResNet-RS-50 (160 image res)	2021-03-13
ColorNet: Investigating the importance of color spaces for image classification	✓ Link	84.32%						ColorNet (RHYLH with Conv Layer)	2019-02-01
Three things everyone should know about Vision Transformers	✓ Link	84.3%						ViT-B@384 (attn finetune)	2022-03-18
BiFormer: Vision Transformer with Bi-Level Routing Attention	✓ Link	84.3%						BiFormer-S* (IN1k ptretrain)	2023-03-15
Sliced Recursive Transformer	✓ Link	84.3%	21.3M	42.8				SReT-S (512 res, ImageNet-1K only)	2021-11-09
LambdaNetworks: Modeling Long-Range Interactions Without Attention	✓ Link	84.3%	42M					LambdaResNet200	2021-02-17
MogaNet: Multi-order Gated Aggregation Network	✓ Link	84.3%	44M	9.9				MogaNet-B	2022-11-07
TResNet: High Performance GPU-Dedicated Architecture	✓ Link	84.3%	77M					TResNet-XL	2020-03-30
Billion-scale semi-supervised learning for image classification	✓ Link	84.3%	88M					ResNeXt-101 32x8d (semi-weakly sup.)	2019-05-02
Neighborhood Attention Transformer	✓ Link	84.3%	90M	13.7				NAT-Base	2022-04-14
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network	✓ Link	84.2%		15.8				Assemble-ResNet152	2020-01-17
Bottleneck Transformers for Visual Recognition	✓ Link	84.2%						BoTNet T7-320	2021-01-27
Visual Parser: Representing Part-whole Hierarchies with Transformers	✓ Link	84.2%						ViP-B\|384	2021-07-13
A Study on Transformer Configuration and Training Objective		84.2						Bamboo (Bamboo-B)	2022-05-21
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	84.2%						RegnetY16GF@224 (cosub)	2022-12-09
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	84.2%	49M	6.5				EfficientViT-B3 (r288)	2022-05-29
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	84.2%	50M	8				InternImage-S	2022-11-10
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP		84.2%	73.5M	9.9				UniNet-B4	2021-10-08
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	84.2%	75.9M	8.7				FasterViT-2	2023-06-09
Training data-efficient image transformers & distillation through attention	✓ Link	84.2%	86M					DeiT-B	2020-12-23
Masking meets Supervision: A Strong Learning Alliance	✓ Link	84.2%	86.6M					ViT-B @224 (DeiT-III + AugSub)	2023-06-20
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link	84.2%	97.7M	15				MambaVision-B	2024-07-10
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	84.2%	142.3M	38.1				RevBiFPN-S6	2022-06-28
Exploring the Limits of Weakly Supervised Pretraining	✓ Link	84.2%	194M	72				ResNeXt-101 32×16d	2018-05-02
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run		84.1%		2.1				FBNetV5-F-CLS	2021-11-19
Three things everyone should know about Vision Transformers	✓ Link	84.1%						ViT-B-36x1	2022-03-18
Three things everyone should know about Vision Transformers	✓ Link	84.1%						ViT-B-18x2	2022-03-18
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	84.1%						XCiT-M (+MixPro)	2023-04-24
Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces	✓ Link	84.1%						DGMMC-S	2024-10-17
Self-training with Noisy Student improves ImageNet classification	✓ Link	84.1%	12M					NoisyStudent (EfficientNet-B3)	2019-11-11
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications	✓ Link	84.1%	21.76M	3.597				CAS-ViT-T	2024-08-07
MetaFormer Baselines for Vision	✓ Link	84.1%	26M	4.1				CAFormer-S18 (224 res, 21K)	2022-10-24
Going deeper with Image Transformers	✓ Link	84.1%	26.6M	19.3				CAIT-XS-24	2021-03-31
MetaFormer Baselines for Vision	✓ Link	84.1%	40M	7.6				ConvFormer-S36 (224 res)	2022-10-24
All Tokens Matter: Token Labeling for Training Better Vision Transformers	✓ Link	84.1%	56M	16				LV-ViT-M	2021-04-22
Vicinity Vision Transformer	✓ Link	84.1%	61.8M	10.8				VVT-L (224 res)	2022-06-21
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	84.1%	75M	15.7				CoAtNet-2	2021-06-09
Conformer: Local Features Coupling Global Representations for Visual Recognition	✓ Link	84.1%	83.3M	46.6				Conformer-B	2021-05-09
Augmenting Convolutional networks with attention-based aggregation	✓ Link	84.1%	188.6M					PatchConvNet-B120	2021-12-27
Generalized Parametric Contrastive Learning	✓ Link	84.0%						GPaCo (Vit-B)	2022-09-26
Scalable Pre-training of Large Autoregressive Image Models	✓ Link	84.0						AIM-7B	2024-01-16
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	84.0%	19M					FixEfficientNetB4	2020-03-18
Semi-Supervised Recognition under a Noisy and Fine-grained Dataset	✓ Link	84.0%	25.58M					Fix_ResNet50_vd_ssld	2020-06-18
TransNeXt: Robust Foveal Visual Perception for Vision Transformers	✓ Link	84.0%	28.2M	5.7				TransNeXt-Tiny (IN-1K supervised, 224)	2023-11-28
LambdaNetworks: Modeling Long-Range Interactions Without Attention	✓ Link	84.0%	35M					LambdaResNet152	2021-02-17
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	84%	43M	19				EfficientNet-B6	2019-05-28
Global Context Vision Transformers	✓ Link	84.0%	51M	8.5				GC ViT-S	2022-06-20
Bottleneck Transformers for Visual Recognition	✓ Link	84%	53.9M					BoTNet T6	2021-01-27
Rethinking Spatial Dimensions of Vision Transformers	✓ Link	84%	73.8M	12.5				PiT-B	2021-03-30
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network	✓ Link	84%	89M	15.4				DeepMAD-89M	2023-03-05
EfficientNetV2: Smaller Models and Faster Training	✓ Link	83.9%						EfficientNetV2-S	2021-04-01
SP-ViT: Learning 2D Spatial Priors for Vision Transformers	✓ Link	83.9%						Our SP-ViT-S	2022-06-15
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	83.9%						UniRepLKNet-S	2023-11-27
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link	83.9%						DeBiFormer-S	2024-10-11
Wave-ViT: Unifying Wavelet and Transformers for Visual Representation Learning	✓ Link	83.9%	22.7M	4.7				Wave-ViT-S	2022-07-11
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention	✓ Link	83.9%	24M	4.3				DAT-T++	2023-09-04
Adaptive Split-Fusion Transformer	✓ Link	83.9%	56.7M					ASF-former-B	2022-04-26
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	✓ Link	83.9	57.1M					DynamicViT-LV-M/0.8	2021-06-03
Transformer in Transformer	✓ Link	83.9%	65.6M					TNT-B	2021-02-27
ResNeSt: Split-Attention Networks	✓ Link	83.9%	70M					ResNeSt-200	2020-04-19
Regularized Evolution for Image Classifier Architecture Search	✓ Link	83.9%	469M	208				AmoebaNet-A	2018-02-05
CLCNet: Rethinking of Ensemble Modeling with Classification Confidence Network	✓ Link	83.88%		18.58				CLCNet (S:B4+D:B7)	2022-05-19
Revisiting ResNets: Improved Training and Scaling Strategies	✓ Link	83.8%		54				ResNet-RS-270 (256 image res)	2021-03-13
Bottleneck Transformers for Visual Recognition	✓ Link	83.8%						SENet-350	2021-01-27
DeiT III: Revenge of the ViT	✓ Link	83.8%						ViT-B @224 (DeiT III)	2022-04-14
ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders	✓ Link	83.8%						ColorMAE-Green-ViTB-1600	2024-07-17
Sliced Recursive Transformer	✓ Link	83.8%	21M	18.5				SReT-S (384 res, ImageNet-1K only)	2021-11-09
Dilated Neighborhood Attention Transformer	✓ Link	83.8%	51M	7.8				DiNAT-Small	2022-09-29
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding	✓ Link	83.8%	68M	17.9				Transformer local-attention (NesT-B)	2021-05-26
PVT v2: Improved Baselines with Pyramid Vision Transformer	✓ Link	83.8%	82M	11.8				PVTv2-B4	2021-06-25
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	83.7%						CA-Swin-S (+MixPro)	2023-04-24
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	83.7%						GTP-ViT-L/P8	2023-11-06
MetaFormer Baselines for Vision	✓ Link	83.7%	27M	3.9				ConvFormer-S18 (224 res, 21K)	2022-10-24
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	83.7%	50M	8.7				RDNet-S	2024-03-28
Vision Transformer with Deformable Attention	✓ Link	83.7%	50M	9.0				DAT-S	2022-01-03
Neighborhood Attention Transformer	✓ Link	83.7%	51M	7.8				NAT-Small	2022-04-14
Learned Queries for Efficient Local Attention	✓ Link	83.7%	56M	9.7				QnA-ViT-Base	2021-12-21
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	83.7%	82M	21.8				RevBiFPN-S5	2022-06-28
Vision GNN: An Image is Worth Graph of Nodes	✓ Link	83.7%	92.6M	16.8				Pyramid ViG-B	2022-06-01
Twins: Revisiting the Design of Spatial Attention in Vision Transformers	✓ Link	83.7%	99.2M	15.1				Twins-SVT-L	2021-04-28
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	83.67%	22.05M					TransBoost-ViT-S	2022-05-26
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	83.65%						XCiT-S	2023-08-18
MaxViT: Multi-Axis Vision Transformer	✓ Link	83.62%	31M	5.6				MaxViT-T (224res)	2022-04-04
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	83.61%						Wave-ViT-S	2023-08-18
Fast Vision Transformers with HiLo Attention	✓ Link	83.6%		13.2				LITv2-B	2022-05-26
MultiGrain: a unified image embedding for classes and instances	✓ Link	83.6%						MultiGrain PNASNet (500px)	2019-02-14
Masked Autoencoders Are Scalable Vision Learners	✓ Link	83.6%						MAE (ViT-L)	2021-11-11
Pattern Attention Transformer with Doughnut Kernel		83.6%						PAT-B	2022-11-30
HyenaPixel: Global Image Context with Convolutions	✓ Link	83.6%						HyenaPixel-Attention-Former-S18	2024-02-29
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	83.6%	9.2M					FixEfficientNet-B2	2020-03-18
MetaFormer Baselines for Vision	✓ Link	83.6%	26M	4.1				CAFormer-S18 (224 res)	2022-10-24
IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation	✓ Link	83.6%	39.3M	7.8				IPT-B	2022-12-06
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	✓ Link	83.6%	48.5M	27.6				ViTAE-B-Stage	2021-06-07
ResT: An Efficient Transformer for Visual Recognition	✓ Link	83.6%	51.63M	7.9				ResT-Large	2021-05-28
High-Performance Large-Scale Image Recognition Without Normalization	✓ Link	83.6%	71.5M	12.38				NFNet-F0	2021-02-11
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training	✓ Link	83.6%	98M	38.2				SE-ResNeXt-101, 64x4d, S=2(320px)	2020-11-30
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	83.6%	116M					ResMLP-B24/8	2021-05-07
Tiny Models are the Computational Saver for Large Models	✓ Link	83.52						TinySaver(EfficientFormerV2_l, 0.01 Acc drop)	2024-03-26
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	83.5%		4				EfficientViT-B3 (r224)	2022-05-29
Bottleneck Transformers for Visual Recognition	✓ Link	83.5%		19.3				BoTNet T5	2021-01-27
HyenaPixel: Global Image Context with Convolutions	✓ Link	83.5%						HyenaPixel-Bidirectional-Former-S18	2024-02-29
Augmenting Convolutional networks with attention-based aggregation	✓ Link	83.5%	99.4M					PatchConvNet-B60	2021-12-27
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion	✓ Link	83.46%						SigLIP B/16 + PrefixedIter Decoder	2024-07-15
Three things everyone should know about Vision Transformers	✓ Link	83.4%						ViT-B (hMLP + BeiT)	2022-03-18
MobileNetV4 -- Universal Models for the Mobile Ecosystem	✓ Link	83.4%						MNv4-Hybrid-L	2024-04-16
UniFormer: Unifying Convolution and Self-attention for Visual Recognition	✓ Link	83.4%	22M	3.6				UniFormer-S	2022-01-24
DeiT III: Revenge of the ViT	✓ Link	83.4%	22M	15.5				ViT-S @384 (DeiT III)	2022-04-14
MogaNet: Multi-order Gated Aggregation Network	✓ Link	83.4%	25M	5				MogaNet-S	2022-11-07
Global Context Vision Transformers	✓ Link	83.4%	28M	4.7				GC ViT-T	2022-06-20
Billion-scale semi-supervised learning for image classification	✓ Link	83.4%	42M					ResNeXt-101 32x4d (semi-weakly sup.)	2019-05-02
Sequencer: Deep LSTM for Image Classification	✓ Link	83.4%	54M	16.6				Sequencer2D-L	2022-05-04
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?	✓ Link	83.4%	65.9M					sMLPNet-B (ImageNet-1k)	2021-09-12
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training	✓ Link	83.34%	98M	61.1				SE-ResNeXt-101, 64x4d, S=2(416px)	2020-11-30
CvT: Introducing Convolutions to Vision Transformers	✓ Link	83.3%		24.9				CvT-21 (384 res)	2021-03-29
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	✓ Link	83.3%		34.2				T2T-ViT-14\|384	2021-01-28
Incorporating Convolution Designs into Visual Transformers	✓ Link	83.3%	24.2M	12.9				CeiT-S (384 finetune res)	2021-03-22
All Tokens Matter: Token Labeling for Training Better Vision Transformers	✓ Link	83.3%	26M	6.6				LV-ViT-S	2021-04-22
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models	✓ Link	83.3%	27.8M	5.7				MOAT-0 1K only	2022-10-04
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	83.3%	30M	9.9				EfficientNet-B5	2019-05-28
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding	✓ Link	83.3%	38M	10.4				Transformer local-attention (NesT-S)	2021-05-26
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding	✓ Link	83.3%	39.7M	8.7				ViL-Medium-D	2021-03-29
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	83.3%	42M	8.4				CoAtNet-1	2021-06-09
Fast Vision Transformers with HiLo Attention	✓ Link	83.3%	49M	7.5				LITv2-M	2022-05-26
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link	83.3%	50.1M	7.5				MambaVision-S	2024-07-10
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism	✓ Link	83.3%	88M	15.2				Shift-B	2022-01-26
MultiGrain: a unified image embedding for classes and instances	✓ Link	83.2%						MultiGrain PNASNet (450px)	2019-02-14
Meta Pseudo Labels	✓ Link	83.2%						Meta Pseudo Labels (ResNet-50)	2020-03-23
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	83.2%						UniRepLKNet-T	2023-11-27
HyenaPixel: Global Image Context with Convolutions	✓ Link	83.2%						HyenaPixel-Former-S18	2024-02-29
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	83.2%	11M	2.0				TinyViT-11M-distill (21k)	2022-07-21
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	83.2%	16.5M					ReXNet-R_2.0	2020-07-02
Learned Queries for Efficient Local Attention	✓ Link	83.2%	25M	4.4				QnA-ViT-Small	2021-12-21
Neighborhood Attention Transformer	✓ Link	83.2%	28M	4.3				NAT-Tiny	2022-04-14
Contextual Transformer Networks for Visual Recognition	✓ Link	83.2%	40.9M	8.5				SE-CoTNetD-101	2021-07-26
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios	✓ Link	83.2%	44.8M	8.3				Next-ViT-B	2022-07-12
PVT v2: Improved Baselines with Pyramid Vision Transformer	✓ Link	83.2%	45.2M	6.9				PVTv2-B3	2021-06-25
Augmenting Convolutional networks with attention-based aggregation	✓ Link	83.2%	47.7M					PatchConvNet-S120	2021-12-27
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	83.2%	53.4M	5.3				FasterViT-1	2023-06-09
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding	✓ Link	83.2%	55.7M	13.4				ViL-Base-D	2021-03-29
CycleMLP: A MLP-like Architecture for Dense Prediction	✓ Link	83.2%	76M	12.3				CycleMLP-B5	2021-07-21
MultiGrain: a unified image embedding for classes and instances	✓ Link	83.1%						MultiGrain SENet154 (450px)	2019-02-14
DeepViT: Towards Deeper Vision Transformer	✓ Link	83.1%						DeepVit-L* (DeiT training recipe)	2021-03-22
DeiT III: Revenge of the ViT	✓ Link	83.1%						ViT-S @224 (DeiT III, 21k)	2022-04-14
Meta Knowledge Distillation		83.1%						MKD ViT-S	2022-02-16
Co-training $2^L$ Submodels for Visual Recognition	✓ Link	83.1%						ViT-S@224 (cosub)	2022-12-09
Pattern Attention Transformer with Doughnut Kernel		83.1%						PAT-S	2022-11-30
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	83.1%	21M	4.3				TinyViT-21M	2022-07-21
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?	✓ Link	83.1%	48.6M					sMLPNet-S (ImageNet-1k)	2021-09-12
Vision GNN: An Image is Worth Graph of Nodes	✓ Link	83.1%	51.7M	8.9				Pyramid ViG-M	2022-06-01
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	83.09%						SwinV2-Ti	2023-08-18
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window		83.01%	39.8M	7.0				gSwin-S	2022-08-24
MultiGrain: a unified image embedding for classes and instances	✓ Link	83.0%						MultiGrain SENet154 (400px)	2019-02-14
Graph Convolutions Enrich the Self-Attention in Transformers!	✓ Link	83%						Swin-S + GFSA	2023-12-07
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications	✓ Link	83.0%	12.42M	1.887				CAS-ViT-M	2024-08-07
CvT: Introducing Convolutions to Vision Transformers	✓ Link	83%	20M	16.3				CvT-13 (384 res)	2021-03-29
Semi-Supervised Recognition under a Noisy and Fine-grained Dataset	✓ Link	83.0%	25.58M					ResNet50_vd_ssld	2020-06-18
MetaFormer Baselines for Vision	✓ Link	83.0%	27M	3.9				ConvFormer-S18 (224 res)	2022-10-24
Multiscale Vision Transformers	✓ Link	83.0%	37M	7.8				MViT-B-16	2021-04-22
ResNeSt: Split-Attention Networks	✓ Link	83.0%	48M					ResNeSt-101	2020-04-19
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	83%	48.7M	10.6				RevBiFPN-S4	2022-06-28
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition	✓ Link	83.0%	183M	13.9				ZenNAS (0.8ms)	2021-02-01
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	82.9%		1.881				NASViT (supernet)	2021-09-29
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	82.9%						DeiT-B (+MixPro)	2023-04-24
MobileNetV4 -- Universal Models for the Mobile Ecosystem	✓ Link	82.9%						MNv4-Conv-L	2024-04-16
IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation	✓ Link	82.9%	24.3M	4.7				IPT-S	2022-12-06
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding	✓ Link	82.9%	39.8M					ViL-Medium-W	2021-03-29
Global Filter Networks for Image Classification	✓ Link	82.9%	54M	8.6				GFNet-H-B	2021-07-01
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution	✓ Link	82.9%	66.8M	22.2	20771G		2.22G	Oct-ResNet-152 (SE)	2019-04-10
Progressive Neural Architecture Search	✓ Link	82.9%	86.1M	50		96.2	2.5G	PNASNet-5	2017-12-02
Harmonic Convolutional Networks based on Discrete Cosine Transform	✓ Link	82.85%	88.2M	31.4				Harm-SE-RNX-101 64x4d (320x320, Mean-Max Pooling)	2020-01-18
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	82.8%		8				GTP-LV-ViT-M/P8	2023-11-06
Knowledge distillation: A good teacher is patient and consistent	✓ Link	82.8%						FunMatch - T384+224 (ResNet-50)	2021-06-09
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	82.8%						CA-Swin-T (+MixPro)	2023-04-24
Graph Convolutions Enrich the Self-Attention in Transformers!	✓ Link	82.8%						CaiT-S + GFSA	2023-12-07
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	✓ Link	82.8%	24M	5.0				RDNet-T	2024-03-28
Visual Attention Network	✓ Link	82.8%	26.6M	5				VAN-B2	2022-02-20
DaViT: Dual Attention Vision Transformers	✓ Link	82.8%	28.3M					DaViT-T	2022-04-07
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	82.8%	34.7M	3.4				ReXNet_3.0	2020-07-02
Sequencer: Deep LSTM for Image Classification	✓ Link	82.8%	38M	11.1				Sequencer2D-M	2022-05-04
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification	✓ Link	82.8%	44.3M	9.5				CrossViT-18+	2021-03-27
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism	✓ Link	82.8%	50M	8.5				Shift-S	2022-01-26
HRFormer: High-Resolution Transformer for Dense Prediction	✓ Link	82.8%	50.3M	13.7				HRFormer-B	2021-10-18
Bottleneck Transformers for Visual Recognition	✓ Link	82.8%	54.7M	10.9				BoTNet T4	2021-01-27
Kolmogorov-Arnold Transformer	✓ Link	82.8	86.6M	17.06				KAT-B*	2024-09-16
MultiGrain: a unified image embedding for classes and instances	✓ Link	82.7%						MultiGrain SENet154 (500px)	2019-02-14
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	82.7%						PVT-M (+MixPro)	2023-04-24
Adaptive Split-Fusion Transformer	✓ Link	82.7%	19.3M					ASF-former-S	2022-04-26
Container: Context Aggregation Network	✓ Link	82.7%	22.1M	8.1				Container Container	2021-06-02
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP		82.7%	22.5M	2.4				UniNet-B2	2021-10-08
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	82.7%	24M	2.1				EfficientViT-B2 (r256)	2022-05-29
Dilated Neighborhood Attention Transformer	✓ Link	82.7%	28M	4.3				DiNAT-Tiny	2022-09-29
ELSA: Enhanced Local Self-Attention for Vision Transformer	✓ Link	82.7%	28M	4.8				ELSA-Swin-T	2021-12-23
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link	82.7%	35.1M	5.1				MambaVision-T2	2024-07-10
Learning Transferable Architectures for Scalable Image Recognition	✓ Link	82.7%	88.9M	23.8	1648G		2.38G	NASNET-A(6)	2017-07-21
Towards Robust Vision Transformer	✓ Link	82.7%	91.8M	17.7				RVT-B*	2021-05-17
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations	✓ Link	82.64%						CMA(ViT-B/16)	2025-03-24
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run		82.6%		1				FBNetV5-C-CLS	2021-11-19
MultiGrain: a unified image embedding for classes and instances	✓ Link	82.6%						MultiGrain PNASNet (400px)	2019-02-14
Three things everyone should know about Vision Transformers	✓ Link	82.6%						ViT-S-24x2	2022-03-18
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	82.6%						FastViT-SA24	2023-03-24
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	82.6%	7.8M					FixEfficientNet-B1	2020-03-18
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	82.6%	19M	4.2				EfficientNet-B4	2019-05-28
Training data-efficient image transformers & distillation through attention	✓ Link	82.6%	22M					DeiT-B	2020-12-23
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	✓ Link	82.6%	64.4M	30				T2T-ViTt-24	2021-01-28
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	82.54%						ViT-S	2023-08-18
CvT: Introducing Convolutions to Vision Transformers	✓ Link	82.5%		7.1				CvT-21	2021-03-29
TransNeXt: Robust Foveal Visual Perception for Vision Transformers	✓ Link	82.5%	12.8M	2.7				TransNeXt-Micro (IN-1K supervised, 224)	2023-11-28
Fixing the train-test resolution discrepancy	✓ Link	82.5%	25.6M					FixResNet-50 Billion-scale@224	2019-06-14
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios	✓ Link	82.5%	31.7M	5.8				Next-ViT-S	2022-07-12
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	82.5%	39.4M	2.334				LeViT-384	2021-04-02
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification	✓ Link	82.5%	43.3M	9				CrossViT-18	2021-03-27
MetaFormer Is Actually What You Need for Vision	✓ Link	82.5%	73M	23.2				MetaFormer PoolFormer-M48	2021-11-22
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	✓ Link	82.5%	152M	30				ConViT-B+	2021-03-19
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	82.46%	28.59M					TransBoost-ConvNext-T	2022-05-26
ReViT: Enhancing Vision Transformers Feature Diversity with Attention Residual Connections	✓ Link	82.4						ReViT-B	2024-02-17
Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks	✓ Link	82.4%						M2D-T	2024-12-20
Self-training with Noisy Student improves ImageNet classification	✓ Link	82.4%	9.2M					NoisyStudent (EfficientNet-B2)	2019-11-11
AutoFormer: Searching Transformers for Visual Recognition	✓ Link	82.4%	54M	11				AutoFormer-base	2021-07-01
ResNet strikes back: An improved training procedure in timm	✓ Link	82.4%	60.2M					ResNet-152 (A2 + reg)	2021-10-01
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	✓ Link	82.4%	86M	17				ConViT-B	2021-03-19
Rethinking and Improving Relative Position Encoding for Vision Transformer	✓ Link	82.4%	87M	35.368				DeiT-B with iRPE-K	2021-07-29
Mega: Moving Average Equipped Gated Attention	✓ Link	82.4%	90M					Mega	2022-09-21
Spatial-Channel Token Distillation for Vision MLPs	✓ Link	82.4%	122.6M	24.1				ResMLP-B24 + STD	2022-07-23
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers	✓ Link	82.37%						ViT-B/16-224+HTM	2022-10-14
ColorNet: Investigating the importance of color spaces for image classification	✓ Link	82.35%						ColorNet	2019-02-01
Polynomial, trigonometric, and tropical activations	✓ Link	82.34	28M			96.03		ConvNeXt-T-Hermite	2025-02-03
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	✓ Link	82.3%		27.6				T2T-ViT-24	2021-01-28
Three things everyone should know about Vision Transformers	✓ Link	82.3%						ViT-S-48x1	2022-03-18
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection	✓ Link	82.3%	24M	4.7				MViTv2-T	2021-12-02
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search	✓ Link	82.3%	27.8M	8.4	12G		0.42G	SCARLET-A4	2019-08-16
Sequencer: Deep LSTM for Image Classification	✓ Link	82.3%	28M	8.4				Sequencer2D-S	2022-05-04
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification	✓ Link	82.3%	28.2M	6.1				CrossViT-15+	2021-03-27
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link	82.3%	31.8M	4.4				MambaVision-T	2024-07-10
GLiT: Neural Architecture Search for Global and Local Image Transformer	✓ Link	82.3%	96.1M	17				GLiT-Bases	2021-07-07
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	82.29%						EViT (delete)	2023-08-18
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	82.22%						STViT-Swin-Ti	2023-08-18
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search	✓ Link	82.2%		15.8				BossNet-T1	2021-03-23
Going deeper with Image Transformers	✓ Link	82.2%	17.3M	14.3				CAIT-XXS-36	2021-03-31
CvT: Introducing Convolutions to Vision Transformers	✓ Link	82.2%	18M	4.1				CvT-13-NAS	2021-03-29
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	✓ Link	82.2%	19.2M	12.0				ViTAE-S-Stage	2021-06-07
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	✓ Link	82.2%	39.2M	19.6				T2T-ViTt-19	2021-01-28
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer	✓ Link	82.2%	39.6M					Evo-LeViT-384*	2021-08-03
Visformer: The Vision-friendly Transformer	✓ Link	82.2%	40.2M	4.9				Visformer-S	2021-04-26
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	✓ Link	82.2%	48M	10				ConViT-S+	2021-03-19
Patches Are All You Need?	✓ Link	82.20	51.6M					ConvMixer-1536/20	2022-01-24
DeepViT: Towards Deeper Vision Transformer	✓ Link	82.2%	55M					DeepVit-L	2021-03-22
Bottleneck Transformers for Visual Recognition	✓ Link	82.2%	66.6M					SENet-152	2021-01-27
Exploring the Limits of Weakly Supervised Pretraining	✓ Link	82.2%	88M					ResNeXt-101 32x8d	2018-05-02
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	82.16%	71.71M					TransBoost-Swin-T	2022-05-26
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training	✓ Link	82.13%	88.6M	18.8				ResNeXt-101, 64x4d, S=2(224px)	2020-11-30
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	82.11%						ToMe-ViT-S	2023-08-18
Asymmetric Masked Distillation for Pre-Training Small Foundation Models		82.1%	22M					AMD(ViT-S/16)	2023-11-06
Augmenting Convolutional networks with attention-based aggregation	✓ Link	82.1%	25.2M					PatchConvNet-S60	2021-12-27
Vision GNN: An Image is Worth Graph of Nodes	✓ Link	82.1%	27.3M	4.6				Pyramid ViG-S	2022-06-01
A ConvNet for the 2020s	✓ Link	82.1%	29M	4.5				ConvNeXt-T	2022-01-10
Spatial-Channel Token Distillation for Vision MLPs	✓ Link	82.1%	30.1M	4.0				CycleMLP-B2 + STD	2022-07-23
FasterViT: Fast Vision Transformers with Hierarchical Attention	✓ Link	82.1%	31.4M	3.3				FasterViT-0	2023-06-09
Incorporating Convolution Designs into Visual Transformers	✓ Link	82%		4.5				CeiT-S	2021-03-22
Differentiable Model Compression via Pseudo Quantization Noise	✓ Link	82.0						DIFFQ (λ=1e−2)	2021-04-20
From Xception to NEXcepTion: New Design Decisions and Neural Architecture Search	✓ Link	82%						NEXcepTion-S	2022-12-16
Global Context Vision Transformers	✓ Link	82.0%	20M	2.6				GC ViT-XT	2022-06-20
Container: Context Aggregation Network	✓ Link	82%	20M	3.2				Container-Light	2021-06-02
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding	✓ Link	82%	24.6M	4.86				ViL-Small	2021-03-29
PVT v2: Improved Baselines with Pyramid Vision Transformer	✓ Link	82%	25.4M	4				PVTv2-B2	2021-06-25
Active Token Mixer	✓ Link	82%	27.2M	4				ActiveMLP-T	2022-03-11
Fast Vision Transformers with HiLo Attention	✓ Link	82%	28M	3.7				LITv2-S	2022-05-26
Vision Transformer with Deformable Attention	✓ Link	82.0%	29M	4.6				DAT-T	2022-01-03
[]()		81.97%						Swin-T (SAMix+DM)
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	81.96%						EViT (fuse)	2023-08-18
[]()		81.92%						Swin-T (AutoMix+DM)
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	81.9%		4.8				GTP-LV-ViT-S/P8	2023-11-06
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	✓ Link	81.9%		17.0				T2T-ViT-19	2021-01-28
A Fast Knowledge Distillation Framework for Visual Recognition	✓ Link	81.9%						ResNet-101 (224 res, Fast Knowledge Distillation)	2021-12-02
Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models	✓ Link	81.9%						Discrete Adversarial Distillation (ViT-B, 224)	2023-11-02
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention	✓ Link	81.9%						DeBiFormer-T	2024-10-11
Towards Robust Vision Transformer	✓ Link	81.9%	23.3M	4.7				RVT-S*	2021-05-17
Rethinking Spatial Dimensions of Vision Transformers	✓ Link	81.9%	23.5M	2.9				PiT-S	2021-03-30
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?	✓ Link	81.9%	24.1M					sMLPNet-T (ImageNet-1k)	2021-09-12
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding	✓ Link	81.9%	79M	6.74				ViL-Base-W	2021-03-29
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles	✓ Link	81.89%						Swin-T+SSA	2023-06-02
Attentive Normalization	✓ Link	81.87%		7.51				AOGNet-40M-AN	2019-08-04
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run		81.8%		0.726				FBNetV5	2021-11-19
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	81.8%		0.757				NASViT-A5	2021-09-29
Parametric Contrastive Learning	✓ Link	81.8%						ResNet-200	2021-07-26
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality	✓ Link	81.8%						RepMLPNet-L256	2021-12-21
From Xception to NEXcepTion: New Design Decisions and Neural Architecture Search	✓ Link	81.8%						NEXcepTion-TP	2022-12-16
Neighborhood Attention Transformer	✓ Link	81.8%	20M	2.7				NAT-Mini	2022-04-14
Dilated Neighborhood Attention Transformer	✓ Link	81.8%	20M	2.7				DiNAT-Mini	2022-09-29
ResNet strikes back: An improved training procedure in timm	✓ Link	81.8%	60.2M					ResNet-152 (A2)	2021-10-01
Kolmogorov-Arnold Transformer	✓ Link	81.8	86.6M	16.87				DeiT-B	2024-09-16
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks	✓ Link	81.72%	25.6M					MEAL V2 (ResNet-50) (380 res)	2020-09-17
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window		81.71%	21.8M	3.6				gSwin-T	2022-08-24
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run		81.7%		0.685				FBNetV5-A-CLS	2021-11-19
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks	✓ Link	81.7%						T2T-ViT-14	2021-05-05
Learned Queries for Efficient Local Attention	✓ Link	81.7%	16M	2.5				QnA-ViT-Tiny	2021-12-21
AutoFormer: Searching Transformers for Visual Recognition	✓ Link	81.7%	22.9M	5.1				AutoFormer-small	2021-07-01
When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism	✓ Link	81.7%	28M	4.4				Shift-T	2022-01-26
Bottleneck Transformers for Visual Recognition	✓ Link	81.7%	33.5M	7.3				BoTNet T3	2021-01-27
CvT: Introducing Convolutions to Vision Transformers	✓ Link	81.6%		4.5				CvT-13	2021-03-29
Sharpness-Aware Minimization for Efficiently Improving Generalization	✓ Link	81.6%						ResNet-152 (SAM)	2020-10-03
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	81.6%						UniRepLKNet-N	2023-11-27
Rethinking Local Perception in Lightweight Vision Transformer	✓ Link	81.6%	12.3M	2				CloFormer-S	2023-03-31
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	81.6%	17.8M	1.066				LeViT-256	2021-04-02
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	81.6%	19M	1.5				ReXNet_2.0	2020-07-02
Contextual Transformer Networks for Visual Recognition	✓ Link	81.6%	23.1M	4.1				SE-CoTNetD-50	2021-07-26
CoAtNet: Marrying Convolution and Attention for All Data Sizes	✓ Link	81.6%	25M	4.2				CoAtNet-0	2021-06-09
Pay Attention to MLPs	✓ Link	81.6%	73M	31.6				gMLP-B	2021-05-17
Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs		81.5%		0.214				CoE-Large + CondConv	2021-07-08
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	81.5%		13.1				GTP-DeiT-B/P8	2023-11-06
From Xception to NEXcepTion: New Design Decisions and Neural Architecture Search	✓ Link	81.5%						NEXcepTion-T	2022-12-16
Graph Convolutions Enrich the Self-Attention in Transformers!	✓ Link	81.5%						DeiT-S-24 + GFSA	2023-12-07
Self-training with Noisy Student improves ImageNet classification	✓ Link	81.5%	7.8M					NoisyStudent (EfficientNet-B1)	2019-11-11
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	81.5%	11M	2.0				TinyViT-11M	2022-07-21
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding	✓ Link	81.5%	17M	5.8				Transformer local-attention (NesT-T)	2021-05-26
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	✓ Link	81.5%	21.5M	9.6				T2T-ViT-14	2021-01-28
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification	✓ Link	81.5%	27.4M	5.8				CrossViT-15	2021-03-27
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition	✓ Link	81.49%	42.3M					PyConvResNet-101	2020-06-20
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields	✓ Link	81.484%						ViT-B/16 (RPE w/ GAB)	2023-05-08
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	81.4%		0.591				NASViT-A4	2021-09-29
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	81.4%		2.9				MobileOne-S4 (distill)	2022-06-08
Rethinking and Improving Relative Position Encoding for Vision Transformer	✓ Link	81.4%		9.770				DeiT-S with iRPE-QKV	2021-07-29
DeiT III: Revenge of the ViT	✓ Link	81.4%						ViT-S @224 (DeiT III)	2022-04-14
BiFormer: Vision Transformer with Bi-Level Routing Attention	✓ Link	81.4%						BiFormer-T (IN1k ptretrain)	2023-03-15
Bottleneck Transformers for Visual Recognition	✓ Link	81.4%	49.2M					SENet-101	2021-01-27
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	81.33%						GFNet-S	2023-08-18
Adversarial AutoAugment		81.32%						ResNet-200 (Adversarial Autoaugment)	2019-12-24
MultiGrain: a unified image embedding for classes and instances	✓ Link	81.3%						MultiGrain PNASNet (300px)	2019-02-14
Parametric Contrastive Learning	✓ Link	81.3%						ResNet-152	2021-07-26
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	✓ Link	81.3%	27M	5.4				ConViT-S	2021-03-19
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows	✓ Link	81.3%	29M	4.5				Swin-T	2021-03-25
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	81.24	9.5M					SimpleNetV1-9m-correct-labels	2016-08-22
Res2Net: A New Multi-scale Backbone Architecture	✓ Link	81.23%						Res2Net-101	2019-04-02
Shape-Texture Debiased Neural Network Training	✓ Link	81.2						ResNeXt-101 (Debiased+CutMix)	2020-10-12
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	81.2%						PVT-S (+MixPro)	2023-04-24
[]()		81.16%						Swin-T (PuzzleMix+DM)
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	81.15%	25.56M					TransBoost-ResNet50-StrikesBack	2022-05-26
ResNeSt: Split-Attention Networks	✓ Link	81.13%	27.5M	5.39				ResNeSt-50	2020-04-19
[]()		81.12%						DeiT-S (SAMix+DM)
Rethinking and Improving Relative Position Encoding for Vision Transformer	✓ Link	81.1%		9.412				DeiT-S with iRPE-QK	2021-07-29
Graph Convolutions Enrich the Self-Attention in Transformers!	✓ Link	81.1%						DeiT-S-12 + GFSA	2023-12-07
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications	✓ Link	81.1%	5.76M	0.932				CAS-ViT-S	2024-08-07
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	81.1%	12M					EfficientNet-B3	2019-05-28
Visual Attention Network	✓ Link	81.1%	13.9M	2.5				VAN-B1	2022-02-20
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	81.1%	19.6M	3.33				RevBiFPN-S3	2022-06-28
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations	✓ Link	81.1%	236M					ResNet-152x2-SAM	2021-06-03
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	81.09%						DynamicViT-S	2023-08-18
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup	✓ Link	81.08%	44.6M					ResNet-101 (SAMix)	2021-11-30
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	81.0%		0.528				NASViT-A3	2021-09-29
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	✓ Link	81%	13.2M	6.8				ViTAE-13M	2021-06-07
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers	✓ Link	80.98%	44.6M					ResNet-101 (AutoMix)	2021-03-24
[]()		80.91%						DeiT-S (AutoMix+DM)
Parametric Contrastive Learning	✓ Link	80.9%						ResNet-101	2021-07-26
Going deeper with Image Transformers	✓ Link	80.9%	12M	9.6				CAIT-XXS-24	2021-03-31
Rethinking and Improving Relative Position Encoding for Vision Transformer	✓ Link	80.9%	22M	9.318				DeiT-S with iRPE-K	2021-07-29
Centroid Transformers: Learning to Abstract with Attention		80.9%	22.3M	9.4				CentroidViT-S (arXiv, 2021-02)	2021-02-17
Aggregated Residual Transformations for Deep Neural Networks	✓ Link	80.9%	83.6M	31.5		94.7		ResNeXt-101 64x4	2016-11-16
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	80.8%		0.709				AlphaNet-A6	2021-02-16
Supervised Contrastive Learning	✓ Link	80.8%						ResNet-200 (Supervised Contrastive)	2020-04-23
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP	✓ Link	80.8%	11.5M	0.555				UniNet-B0	2022-07-12
LocalViT: Bringing Locality to Vision Transformers	✓ Link	80.8%	22.4M	4.6				LocalViT-S	2021-04-12
A Dot Product Attention Free Transformer		80.8%	23M					DAFT-conv (384 heads, 300 epochs)	2021-09-29
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	80.8%	30M	6				ResMLP-S24	2021-05-07
MobileNetV4 -- Universal Models for the Mobile Ecosystem	✓ Link	80.7%						MNv4-Hybrid-M	2024-04-16
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	80.7%	5.4M	1.3				TinyViT-5M-distill (21k)	2022-07-21
Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs		80.7%	95.3M	0.194				CoE-Large	2021-07-08
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks	✓ Link	80.67%						MEAL V2 (ResNet-50) (224 res)	2020-09-17
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	80.66%						TokenLearner-ViT-8	2023-08-18
ResNeSt: Split-Attention Networks	✓ Link	80.64%	27.5M	4.34				ResNeSt-50-fast	2020-04-19
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	80.64%	60.19M					TransBoost-ResNet152	2022-05-26
Fast AutoAugment	✓ Link	80.6%						ResNet-200 (Fast AA)	2019-05-01
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	80.6%						CaiT-XXS (+MixPro)	2023-04-24
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	80.6%						FastViT-SA12	2023-03-24
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features	✓ Link	80.53%						ResNeXt-101 (CutMix)	2019-05-13
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	80.5%		0.421				NASViT-A2	2021-09-29
Residual Attention Network for Image Classification	✓ Link	80.5%						Attention-92	2017-04-23
Neural Architecture Transfer	✓ Link	80.5%	9.1M					NAT-M4	2020-05-12
IncepFormer: Efficient Inception Transformer with Pyramid Pooling for Semantic Segmentation	✓ Link	80.5%	14.0M	2.3				IPT-T	2022-12-06
GLiT: Neural Architecture Search for Global and Local Image Transformer	✓ Link	80.5%	24.6M	4.4				GLiT-Smalls	2021-07-07
Gated Convolutional Networks with Hybrid Connectivity for Image Classification	✓ Link	80.5%	42.2M	7.1				HCGNet-C	2019-08-26
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	✓ Link	80.43%		1.7				DVT (T2T-ViT-12)	2021-05-31
GhostNetV3: Exploring the Training Strategies for Compact Models	✓ Link	80.4%				95.2		GhostNetV3 1.6x	2024-04-17
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP		80.4%	14M	0.99				UniNet-B1	2021-10-08
ResNet strikes back: An improved training procedure in timm	✓ Link	80.4%	22M					DeiT-S (T2)	2021-10-01
ResNet strikes back: An improved training procedure in timm	✓ Link	80.4%	25M					ResNet50 (A1)	2021-10-01
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window		80.32%	15.5M	2.3				gSwin-VT	2022-08-24
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	80.3%		0.491				AlphaNet-A5	2021-02-16
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks	✓ Link	80.3%						ResNet-50+AutoDropout+RandAugment	2021-01-05
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	80.3%	9.7M	0.86				ReXNet_1.5	2020-07-02
[]()		80.25%						DeiT-S (PuzzleMix+DM)
Attentional Feature Fusion	✓ Link	80.22%	34.7M					iAFF-ResNeXt-50-32x4d	2020-09-29
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	80.2%						UniRepLKNet-P	2023-11-27
Fixing the train-test resolution discrepancy: FixEfficientNet	✓ Link	80.2%	5.3M	1.60				FixEfficientNet-B0	2020-03-18
A Dot Product Attention Free Transformer		80.2%	20.3M					DAFT-conv (16 heads)	2021-09-29
ConvMLP: Hierarchical Convolutional MLPs for Vision	✓ Link	80.2%	42.7M					ConvMLP-L	2021-09-09
A Fast Knowledge Distillation Framework for Visual Recognition	✓ Link	80.1%						ResNet-50 (224 res, Fast Knowledge Distillation)	2021-12-02
HVT: A Comprehensive Vision Framework for Learning in Non-Euclidean Space	✓ Link	80.1%						HVT Base	2024-09-25
A Dot Product Attention Free Transformer		80.1%	23M					DAFT-conv (384 heads, 200 epochs)	2021-09-29
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning	✓ Link	80.1%	55.8M					Inception ResNet V2	2016-02-23
Exploring Randomly Wired Neural Networks for Image Recognition	✓ Link	80.1%	61.5M	7.9				RandWire-WS	2019-04-02
Go Wider Instead of Deeper	✓ Link	80.09%	63M					WideNet-H	2021-07-25
Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs		80%		0.100				CoE-Small + CondConv + PWLU	2021-07-08
BasisNet: Two-stage Model Synthesis for Efficient Inference		80%		0.198				BasisNet-MV3	2021-05-07
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	80.0%		0.444				AlphaNet-A4	2021-02-16
MogaNet: Multi-order Gated Aggregation Network	✓ Link	80%	5.2M	1.44				MogaNet-T (256res)	2022-11-07
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	80%	10.4M	0.624				LeViT-192	2021-04-02
Bottleneck Transformers for Visual Recognition	✓ Link	80%	44.4M					ResNet-101	2021-01-27
Identity Mappings in Deep Residual Networks	✓ Link	79.9%						ResNet-200	2016-03-16
MobileNetV4 -- Universal Models for the Mobile Ecosystem	✓ Link	79.9%						MNv4-Conv-M	2024-04-16
Designing Network Design Spaces	✓ Link	79.9%	39.2M	8				RegNetY-8.0GF	2020-03-30
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations	✓ Link	79.9%	87M					ViT-B/16-SAM	2021-06-03
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	79.86%	44.55M					TransBoost-ResNet101	2022-05-26
Selective Kernel Networks	✓ Link	79.81%	48.9M	8.46				SKNet-101	2019-03-15
Fixing the train-test resolution discrepancy	✓ Link	79.8%						FixResNet-50 CutMix	2019-06-14
Mish: A Self Regularized Non-Monotonic Activation Function	✓ Link	79.8%						CSPResNeXt-50 + Mish	2019-08-23
Revisiting a kNN-based Image Classification System with High-capacity Storage		79.8%						kNN-CLIP	2022-04-03
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	79.8%						FastViT-S12	2023-03-24
Rethinking Local Perception in Lightweight Vision Transformer	✓ Link	79.8%	7.2M	1.1				CloFormer-XS	2023-03-31
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	79.8%	9.2M	1				EfficientNet-B2	2019-05-28
Global Context Vision Transformers	✓ Link	79.8%	12M	2.1				GC ViT-XXT	2022-06-20
CSPNet: A New Backbone that can Enhance Learning Capability of CNN	✓ Link	79.8%	20.5M					CSPResNeXt-50 (Mish+Aug)	2019-11-27
A Dot Product Attention Free Transformer		79.8%	22.6M					DAFT-full	2021-09-29
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	✓ Link	79.74%		0.7				DVT (T2T-ViT-10)	2021-05-31
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	79.7%		0.309				NASViT-A1	2021-09-29
Generalized Parametric Contrastive Learning	✓ Link	79.7%						GPaCo (ResNet-50)	2022-09-26
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	79.7%	45M					ResMLP-36	2021-05-07
Grafit: Learning fine-grained image representations with coarse labels		79.6%						Grafit (ResNet-50)	2020-11-25
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	79.6%	8.8M	0.376				LeViT-128	2021-04-02
ResT: An Efficient Transformer for Visual Recognition	✓ Link	79.6%	13.66M	1.9				ResT-Small	2021-05-28
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation	✓ Link	79.5%		3.4				GTP-DeiT-S/P8	2023-11-06
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	79.5%	7.6M	0.66				ReXNet_1.3	2020-07-02
Go Wider Instead of Deeper	✓ Link	79.49%	40M					WideNet-L	2021-07-25
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup	✓ Link	79.41%	25.6M					ResNet-50 (SAMix)	2021-11-30
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	79.4%		0.357				AlphaNet-A3	2021-02-16
MultiGrain: a unified image embedding for classes and instances	✓ Link	79.4%						MultiGrain R50-AA-500	2019-02-14
Adversarial AutoAugment		79.4%						ResNet-50 (Adversarial Autoaugment)	2019-12-24
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	79.4%						ResMLP-24	2021-05-07
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications	✓ Link	79.4%	5.6M	2.6				EdgeNeXt-S	2022-06-21
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets	✓ Link	79.4%	11.9M	0.591				TinyNet (GhostNet-A)	2020-10-28
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	79.4%	14.8M	2.978				MobileOne-S4	2022-06-08
Designing Network Design Spaces	✓ Link	79.4%	20.6M	4				RegNetY-4.0GF	2020-03-30
Bottleneck Transformers for Visual Recognition	✓ Link	79.4%	28.02M					SENet-50	2021-01-27
Data-Driven Neuron Allocation for Scale Aggregation Networks	✓ Link	79.38%		11.2				ScaleNet-152	2019-04-20
LIP: Local Importance-based Pooling	✓ Link	79.33%	42.9M					LIP-ResNet-101	2019-08-12
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features	✓ Link	79.3%	5.8M	1.841				MobileViTv3-S	2022-09-30
Involution: Inverting the Inherence of Convolution for Visual Recognition	✓ Link	79.3%	34M	6.8				RedNet-152	2021-03-10
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers	✓ Link	79.25%	25.6M					ResNet-50 (AutoMix)	2021-03-24
Self-Knowledge Distillation with Progressive Refinement of Targets	✓ Link	79.24%						PS-KD (ResNet-152 + CutMix)	2020-06-22
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era	✓ Link	79.2%						ResNet-101 (JFT-300M Finetuning)	2017-07-10
Towards Robust Vision Transformer	✓ Link	79.2%	10.9M	1.3				RVT-Ti*	2021-05-17
Multiscale Deep Equilibrium Models	✓ Link	79.2%	81M					Multiscale DEQ (MDEQ-XL)	2020-06-15
How to Use Dropout Correctly on Residual Networks with Batch Normalization	✓ Link	79.152%						DenseNet-169 (H4*)	2023-02-13
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	79.12	5.7M					SimpleNetV1-5m-correct-labels	2016-08-22
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	79.1%		0.317				AlphaNet-A2	2021-02-16
GhostNetV3: Exploring the Training Strategies for Compact Models	✓ Link	79.1%				94.5		GhostNetV3 1.3x	2024-04-17
Attention Augmented Convolutional Networks	✓ Link	79.1%						AA-ResNet-152	2019-04-22
Fixing the train-test resolution discrepancy	✓ Link	79.1%						FixResNet-50	2019-06-14
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	79.1%						MobileOne-S2 (distill)	2022-06-08
Your Diffusion Model is Secretly a Zero-Shot Classifier	✓ Link	79.1%						Diffusion Classifier	2023-03-28
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	79.1%						FastViT-T12	2023-03-24
TinyViT: Fast Pretraining Distillation for Small Vision Transformers	✓ Link	79.1%	5.4M	1.3				TinyViT-5M	2022-07-21
Rethinking Spatial Dimensions of Vision Transformers	✓ Link	79.1%	10.6M	1.4				PiT-XS	2021-03-30
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP		79.1%	11.9M	0.56				UniNet-B0	2021-10-08
Involution: Inverting the Inherence of Convolution for Visual Recognition	✓ Link	79.1%	25.6M	4.7				RedNet-101	2021-03-10
Kolmogorov-Arnold Transformer	✓ Link	79.1	86.6M	16.87				ViT-B/16	2024-09-16
Unsupervised Data Augmentation for Consistency Training	✓ Link	79.04%						ResNet-50 (UDA)	2019-04-29
Data-Driven Neuron Allocation for Scale Aggregation Networks	✓ Link	79.03%		7.5				ScaleNet-101	2019-04-20
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	79.03%						TransBoost-ResNet50	2022-05-26
Contextual Convolutional Neural Networks	✓ Link	79.03%	60M					Co-ResNet-152	2021-08-17
Semi-Supervised Recognition under a Noisy and Fine-grained Dataset	✓ Link	79.0%	5.47M					MobileNetV3_large_x1_0_ssld	2020-06-18
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	79%	10.6M	1.37				RevBiFPN-S2	2022-06-28
ConvMLP: Hierarchical Convolutional MLPs for Vision	✓ Link	79%	17.4M					ConvMLP-M	2021-09-09
Xception: Deep Learning with Depthwise Separable Convolutions	✓ Link	79%	22.855952M		87G		0.838G	Xception	2016-10-07
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization	✓ Link	79%	60.5M	9.1				SpineNet-143	2019-12-10
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations	✓ Link	79%	64M					Mixer-B/8-SAM	2021-06-03
Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks	✓ Link	78.95%						InceptionV3 (FRN layer)	2019-11-21
Averaging Weights Leads to Wider Optima and Better Generalization	✓ Link	78.94%						ResNet-152 + SWA	2018-03-14
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	✓ Link	78.92%	57.40M	10.83				ECA-Net (ResNet-152)	2019-10-08
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	78.9%		0.279				AlphaNet-A1	2021-02-16
MixConv: Mixed Depthwise Convolutional Kernels	✓ Link	78.9%	7.3M	0.565				MixNet-L	2019-07-22
Incorporating Convolution Designs into Visual Transformers	✓ Link	78.8%		3.6				CeiT-T (384 finetune res)	2021-03-22
[]()		78.8				94.4		Inception V3
Self-training with Noisy Student improves ImageNet classification	✓ Link	78.8%	5.3M					NoisyStudent (EfficientNet-B0)	2019-11-11
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	78.8%	7.8M	0.7				EfficientNet-B1	2019-05-28
Bottleneck Transformers for Visual Recognition	✓ Link	78.8%	25.5M					ResNet-50	2021-01-27
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks	✓ Link	78.798%	44.55M	7.858				SGE-ResNet101	2019-05-23
RepVGG: Making VGG-style ConvNets Great Again	✓ Link	78.78%	80.31M	18.4				RepVGG-B2	2021-01-11
Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup	✓ Link	78.76%						ResNet-50	2020-09-15
[]()		78.75%						SAMix+DM (ResNet-50 RSB A3)
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks	✓ Link	78.7%						ResNet-50	2021-01-05
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications	✓ Link	78.7%	3.2M	0.56				CAS-ViT-XS	2024-08-07
A Fast Knowledge Distillation Framework for Visual Recognition	✓ Link	78.7%	5M	1.2				SReT-LT (Fast Knowledge Distillation)	2021-12-02
PVT v2: Improved Baselines with Pyramid Vision Transformer	✓ Link	78.7%	13.1M	2.1				PVTv2-B1	2021-06-25
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	✓ Link	78.65%	42.49M	7.35				ECA-Net (ResNet-101)	2019-10-08
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features	✓ Link	78.64%		1.876				MobileViTv3-1.0	2022-09-30
ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer	✓ Link	78.63%	5M	3.48				EdgeFormer-S	2022-03-08
[]()		78.62%						AutoMix+DM (ResNet-50 RSB A3)
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	78.6%						UniRepLKNet-F	2023-11-27
RCKD: Response-Based Cross-Task Knowledge Distillation for Pathological Image Analysis		78.6	3M					CSAT	2023-10-29
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	78.60%	5.29M					TransBoost-EfficientNetB0	2022-05-26
Visformer: The Vision-friendly Transformer	✓ Link	78.6%	10.3M	1.3				Visformer-Ti	2021-04-26
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	78.6%	17.7M	3				ResMLP-12 (distilled, class-MLP)	2021-05-07
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition	✓ Link	78.60%	52.77M					RepMLP-Res50	2021-05-05
Res2Net: A New Multi-scale Backbone Architecture	✓ Link	78.59%						Res2Net-50-299	2019-04-02
Deep Residual Learning for Image Recognition	✓ Link	78.57%		11.3				ResNet-152	2015-12-10
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation	✓ Link	78.5%						ResNet-50-DW (Deformable Kernels)	2019-10-07
HRFormer: High-Resolution Transformer for Dense Prediction	✓ Link	78.5%	8.0M	1.8				HRFormer-T	2021-10-18
Gated Convolutional Networks with Hybrid Connectivity for Image Classification	✓ Link	78.5%	12.9M	2.0				HCGNet-B	2019-08-26
RepVGG: Making VGG-style ConvNets Great Again	✓ Link	78.5%	55.77M	11.3				RepVGG-B2g4	2021-01-11
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition	✓ Link	78.48%		0.6				DVT (T2T-ViT-7)	2021-05-31
SRM : A Style-based Recalibration Module for Convolutional Neural Networks	✓ Link	78.47%						SRM-ResNet-101	2019-03-26
Averaging Weights Leads to Wider Optima and Better Generalization	✓ Link	78.44%						DenseNet-161 + SWA	2018-03-14
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	78.42%						CoaT-Ti	2023-08-18
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run		78.4%		0.280				FBNetV5-AC-CLS	2021-11-19
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features	✓ Link	78.4%						ResNet-50 (CutMix)	2019-05-13
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels	✓ Link	78.4%	4.8M					ReXNet_1.0-relabel	2021-01-13
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer	✓ Link	78.4%	5.6M					MobileViT-S	2021-10-05
Involution: Inverting the Inherence of Convolution for Visual Recognition	✓ Link	78.4%	15.5M	2.7				RedNet-50	2021-03-10
[]()		78.36%						ResNet-50 (SAMix+DM)
DropBlock: A regularization method for convolutional networks	✓ Link	78.35%						ResNet-50 + DropBlock (0.9 kp, 0.1 label smoothing)	2018-10-30
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	78.34%						Poly-SA-ViT-S	2023-08-18
CondConv: Conditionally Parameterized Convolutions for Efficient Inference	✓ Link	78.3%		0.826				EfficientNet-B0 (CondConv)	2019-04-10
Deep Residual Learning for Image Recognition	✓ Link	78.25%	40M	7.6				ResNet-101	2015-12-10
NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training	✓ Link	78.2%		0.208				NASViT-A0	2021-09-29
MultiGrain: a unified image embedding for classes and instances	✓ Link	78.2%						MultiGrain R50-AA-224	2019-02-14
Vision GNN: An Image is Worth Graph of Nodes	✓ Link	78.2%	10.7M	1.7				Pyramid ViG-Ti	2022-06-01
LocalViT: Bringing Locality to Vision Transformers	✓ Link	78.2%	13.5M	4.8				LocalViT-PVT	2021-04-12
[]()		78.15%						ResNet-50 (AutoMix+DM)
[]()		78.15%						PuzzleMix+DM (ResNet-50 RSB A3)
LIP: Local Importance-based Pooling	✓ Link	78.15%	25.8M					ResNet-50 (LIP Bottleneck-256)	2019-08-12
Wide Residual Networks	✓ Link	78.1%						WRN-50-2-bottleneck	2016-05-23
Separable Self-attention for Mobile Vision Transformers	✓ Link	78.1%	4.9M	1.8				MobileViTv2-1.0	2022-06-06
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	78.1%	10.1M	1.896				MobileOne-S3	2022-06-08
ResNet strikes back: An improved training procedure in timm	✓ Link	78.1%	25M					ResNet50 (A3)	2021-10-01
Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition	✓ Link	78%	5.7M	0.820				ZenNet-400M-SE	2021-02-01
Designing Network Design Spaces	✓ Link	78%	11.2M	1.6				RegNetY-1.6GF	2020-03-30
Scalable Vision Transformers with Hierarchical Pooling	✓ Link	78.00%	21.74M	2.4				HVT-S-1	2021-03-19
Perceiver: General Perception with Iterative Attention	✓ Link	78%	44.9M	707.2				Perceiver (FF)	2021-03-04
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	77.9%	4.8M	0.40				ReXNet_1.0	2020-07-02
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	✓ Link	77.9%	6.5M	4				ViTAE-6M	2021-06-07
Densely Connected Convolutional Networks	✓ Link	77.85%						DenseNet-264	2016-08-25
AlphaNet: Improved Training of Supernets with Alpha-Divergence	✓ Link	77.8%		0.203				AlphaNet-A0	2021-02-16
Data-Driven Neuron Allocation for Scale Aggregation Networks	✓ Link	77.8%		3.8				ScaleNet-50	2019-04-20
ResMLP: Feedforward networks for image classification with data-efficient training	✓ Link	77.8%	15.4M					ResMLP-S12	2021-05-07
[]()		77.71%						ResNet-50 (PuzzleMix+DM)
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets	✓ Link	77.7%	5.1M	0.339				TinyNet-A + RA	2020-10-28
Fast AutoAugment	✓ Link	77.6%						ResNet-50 (Fast AA)	2019-05-01
Sliced Recursive Transformer	✓ Link	77.6%	4.8M	1.1				SReT-T	2021-11-09
Involution: Inverting the Inherence of Convolution for Visual Recognition	✓ Link	77.6%	12.4M	2.2				RedNet-38	2021-03-10
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks	✓ Link	77.584%	25.56M	4.127				SGE-ResNet50	2019-05-23
Go Wider Instead of Deeper	✓ Link	77.54%	29M					WideNet-B	2021-07-25
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks	✓ Link	77.5%						EfficientNet-B0	2021-01-05
Adaptively Connected Neural Networks	✓ Link	77.5%	29.38M					ACNet (ResNet-50)	2019-04-07
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	✓ Link	77.48%	24.37M	3.86				ECA-Net (ResNet-50)	2019-10-08
Densely Connected Convolutional Networks	✓ Link	77.42%						DenseNet-201	2016-08-25
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	77.4%	7.8M	1.299				MobileOne-S2	2022-06-08
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding	✓ Link	77.39%						R-Mix (ResNet-50)	2022-12-09
Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks	✓ Link	77.21%						ResnetV2 50 (FRN layer)	2019-11-21
FBNetV5: Neural Architecture Search for Multiple Tasks in One Run		77.2%		0.215				FBNetV5-AR-CLS	2021-11-19
MogaNet: Multi-order Gated Aggregation Network	✓ Link	77.2%	3M	1.04				MogaNet-XT (256res)	2022-11-07
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	77.2%	4.1M	0.35				ReXNet_0.9	2020-07-02
Deep Polynomial Neural Networks	✓ Link	77.17%						Prodpoly	2020-06-20
Bag of Tricks for Image Classification with Convolutional Neural Networks	✓ Link	77.16%	25M					ResNet-50-D	2018-12-04
What do Deep Networks Like to See?	✓ Link	77.12%						Inception v3	2018-03-22
GhostNetV3: Exploring the Training Strategies for Compact Models	✓ Link	77.1%				93.3		GhostNetV3 1.0x	2024-04-17
Meta Knowledge Distillation		77.1%						MKD ViT-T	2022-02-16
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet		77.1%	6.5M	0.366				GreedyNAS-A	2020-03-25
Bias Loss for Mobile Neural Networks	✓ Link	77.1%	7.1M	0.364				SkipblockNet-L	2021-07-23
Compress image to patches for Vision Transformer	✓ Link	77%		6.442				CI2P-ViT	2025-02-14
Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks	✓ Link	77.0%						SSAL-Resnet50	2021-01-07
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition	✓ Link	77%						UniRepLKNet-A	2023-11-27
Rethinking Local Perception in Lightweight Vision Transformer	✓ Link	77%	4.2M	0.6				CloFormer-XXS	2023-03-31
MixConv: Mixed Depthwise Convolutional Kernels	✓ Link	77%	5.0M	0.360				MixNet-M	2019-07-22
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective	✓ Link	76.91%						ResNet50 (FSGDM)	2024-11-29
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search	✓ Link	76.9%	6.7M	0.730				SCARLET-A	2019-08-16
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	76.81%	5.48M					TransBoost-MobileNetV3-L	2022-05-26
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	✓ Link	76.8%	4.8M	4.6				ViTAE-T-Stage	2021-06-07
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet		76.8%	5.2M	0.324				GreedyNAS-B	2020-03-25
ConvMLP: Hierarchical Convolutional MLPs for Vision	✓ Link	76.8	9M					ConvMLP-S	2021-09-09
Learning Visual Representations for Transfer Learning by Suppressing Texture	✓ Link	76.71%						Perona Malik (Perona and Malik, 1990)	2020-11-03
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	76.7%						PVT-T (+MixPro)	2023-04-24
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features	✓ Link	76.7%	2.5M	0.927				MobileViTv3-XS	2022-09-30
MnasNet: Platform-Aware Neural Architecture Search for Mobile	✓ Link	76.7%	5.2M	0.806			0.0403G	MnasNet-A3	2018-07-31
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding	✓ Link	76.7%	6.7M	1.3				ViL-Tiny-RPB	2021-03-29
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	✓ Link	76.7%	10M	2				ConViT-Ti+	2021-03-19
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	76.70%	21.8M					TransBoost-ResNet34	2022-05-26
LIP: Local Importance-based Pooling	✓ Link	76.64%	8.7M					LIP-DenseNet-BC-121	2019-08-12
X-volution: On the unification of convolution and self-attention		76.6%						ResNet-50 (X-volution, stage3)	2021-06-04
MUXConv: Information Multiplexing in Convolutional Neural Networks	✓ Link	76.6%	4.0M	0.636				MUXNet-l	2020-03-31
Training data-efficient image transformers & distillation through attention	✓ Link	76.6%	5M					DeiT-B	2020-12-23
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features	✓ Link	76.55%	3M	1.064				MobileViTv3-0.75	2022-09-30
MLP-Mixer: An all-MLP Architecture for Vision	✓ Link	76.44%	46M					Mixer-B/16	2021-05-04
Perceiver: General Perception with Iterative Attention	✓ Link	76.4%						Perceiver	2021-03-04
Incorporating Convolution Designs into Visual Transformers	✓ Link	76.4%	6.4M	1.2				CeiT-T	2021-03-22
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup	✓ Link	76.35%	21.8M					ResNet-34 (SAMix)	2021-11-30
[]()		76.3				93.2		VGG
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks	✓ Link	76.3%	5.3M	0.39				EfficientNet-B0	2019-05-28
Designing Network Design Spaces	✓ Link	76.3%	6.3M	0.8				RegNetY-800MF	2020-03-30
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search	✓ Link	76.3%	6.5M	0.658				SCARLET-B	2019-08-16
GLiT: Neural Architecture Search for Global and Local Image Transformer	✓ Link	76.3%	7.2M	1.4				GLiT-Tinys	2021-07-07
Densely Connected Convolutional Networks	✓ Link	76.2%						DenseNet-169	2016-08-25
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet		76.2%	4.7M	0.284				GreedyNAS-C	2020-03-25
Bias Loss for Mobile Neural Networks	✓ Link	76.2%	5.5M	0.246				SkipblockNet-M	2021-07-23
A Simple Episodic Linear Probe Improves Visual Recognition in the Wild	✓ Link	76.13						ELP (naive ResNet50)	2022-01-01
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers	✓ Link	76.1%	21.8M					ResNet-34 (AutoMix)	2021-03-24
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes		75.92%						ResNet-50 MLPerf v0.7 - 2512 steps	2021-02-12
Densely Connected Search Space for More Flexible Neural Architecture Search	✓ Link	75.9%						DenseNAS-A	2019-06-23
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	75.9%	4.8M	0.825				MobileOne-S1	2022-06-08
MoGA: Searching Beyond MobileNetV3	✓ Link	75.9%	5.1M	0.608			0.0304G	MoGA-A	2019-08-04
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	75.9%	5.11M	0.62				RevBiFPN-S1	2022-06-28
LocalViT: Bringing Locality to Vision Transformers	✓ Link	75.9%	6.3M	1.4				LocalViT-TNT	2021-04-12
Semantic-Aware Local-Global Vision Transformer		75.9%	6.5M					SALG-ST	2022-11-27
Involution: Inverting the Inherence of Convolution for Visual Recognition	✓ Link	75.9%	9.2M	1.7				RedNet-26	2021-03-10
FractalNet: Ultra-Deep Neural Networks without Residuals	✓ Link	75.88%						FractalNet-34	2016-05-24
MixConv: Mixed Depthwise Convolutional Kernels	✓ Link	75.8%	4.1M	0.256				MixNet-S	2019-07-22
An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution	✓ Link	75.74%						CoordConv ResNet-50	2018-07-09
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference	✓ Link	75.7%	4.7M	0.288				LeViT-128S	2021-04-02
GhostNet: More Features from Cheap Operations	✓ Link	75.7%	7.3M	0.226				GhostNet ×1.3	2019-11-27
Local Relation Networks for Image Recognition	✓ Link	75.7%	14.7M	2.6				LR-Net-26	2019-04-25
Spatial-Channel Token Distillation for Vision MLPs	✓ Link	75.7%	22.2M	4.3				Mixer-S16 + STD	2022-07-23
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	75.66	3M					SimpleNetV1-small-075-correct-labels	2016-08-22
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization	✓ Link	75.6%						FastViT-T8	2023-03-24
Separable Self-attention for Mobile Vision Transformers	✓ Link	75.6%	2.9M	1.0				MobileViTv2-0.75	2022-06-06
MnasNet: Platform-Aware Neural Architecture Search for Mobile	✓ Link	75.6%	4.8M	0.680				MnasNet-A2	2018-07-31
SCARLET-NAS: Bridging the Gap between Stability and Scalability in Weight-sharing Neural Architecture Search	✓ Link	75.6%	6M	0.560				SCARLET-C	2019-08-16
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples	✓ Link	75.5%						PAWS (ResNet-50, 10% labels)	2021-04-28
Designing Network Design Spaces	✓ Link	75.5%	6.1M	0.6				RegNetY-600MF	2020-03-30
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design	✓ Link	75.4%		0.597				ShuffleNet V2	2018-07-30
Visual Attention Network	✓ Link	75.4%	4.1M	0.9				VAN-B0	2022-02-20
AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks	✓ Link	75.4%	5.99M	0.4338				AsymmNet-Large ×1.0	2021-04-15
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search	✓ Link	75.34%	4.6M	0.776				FairNAS-A	2019-07-03
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias	✓ Link	75.3%		3.0				ViTAE-T	2021-06-07
MUXConv: Information Multiplexing in Convolutional Neural Networks	✓ Link	75.3%	3.4M	0.436				MUXNet-m	2020-03-31
Deep Residual Learning for Image Recognition	✓ Link	75.3%	25M	3.8				ResNet-50	2015-12-10
MnasNet: Platform-Aware Neural Architecture Search for Mobile	✓ Link	75.2%	3.9M	0.624				MnasNet-A1	2018-07-31
Searching for MobileNetV3	✓ Link	75.2%	5.4M	0.438				MobileNet V3-Large 1.0	2019-05-06
DiCENet: Dimension-wise Convolutions for Efficient Networks	✓ Link	75.1%		0.553				DiCENet	2019-06-08
MultiGrain: a unified image embedding for classes and instances	✓ Link	75.1%						MultiGrain NASNet-A-Mobile (350px)	2019-02-14
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search	✓ Link	75.10%	4.5M	0.690				FairNAS-B	2019-07-03
X-volution: On the unification of convolution and self-attention		75%						ResNet-34 (X-volution, stage3)	2021-06-04
GhostNet: More Features from Cheap Operations	✓ Link	75%	13M	2.2				Ghost-ResNet-50 (s=2)	2019-11-27
Densely Connected Convolutional Networks	✓ Link	74.98%						DenseNet-121	2016-08-25
Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours	✓ Link	74.96%						Single-Path NAS	2019-04-05
WaveMix: A Resource-efficient Neural Network for Image Analysis	✓ Link	74.93%						WaveMix-192/16 (level 3)	2022-05-28
Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet	✓ Link	74.9						FF	2021-05-06
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search	✓ Link	74.9%	5.5M	0.375				FBNet-C	2018-12-09
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network	✓ Link	74.9%	5.9M	0.602				ESPNetv2	2018-11-28
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer	✓ Link	74.8%	2.3M	0.7				MobileViT-XS	2021-10-05
LocalViT: Bringing Locality to Vision Transformers	✓ Link	74.8%	5.9M	1.3				LocalViT-T	2021-04-12
Exploring Randomly Wired Neural Networks for Image Recognition	✓ Link	74.7%	5.6M	0.583				RandWire-WS (small)	2019-04-02
AutoFormer: Searching Transformers for Visual Recognition	✓ Link	74.7%	5.7M	1.3				AutoFormer-tiny	2021-07-01
MobileNetV2: Inverted Residuals and Linear Bottlenecks	✓ Link	74.7%	6.9M	1.170				MobileNetV2 (1.4)	2018-01-13
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search	✓ Link	74.69%	4.4M	0.642				FairNAS-C	2019-07-03
Rethinking Channel Dimensions for Efficient Model Design	✓ Link	74.6%	2.7M					ReXNet_0.6	2020-07-02
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware	✓ Link	74.6%	4.0M					Proxyless	2018-12-02
Rethinking Spatial Dimensions of Vision Transformers	✓ Link	74.6%	4.9M	0.7				PiT-Ti	2021-03-30
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	74.4%	11.1M	0,626				DY-MobileNetV2 ×1.0	2019-12-07
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	74.17	9.5M					SimpleNetV1-9m	2016-08-22
Designing Network Design Spaces	✓ Link	74.1%	4.3M	0.4				RegNetY-400MF	2020-03-30
GhostNet: More Features from Cheap Operations	✓ Link	74.1%	6.5M	1.2				Ghost-ResNet-50 (s=4)	2019-11-27
Sliced Recursive Transformer	✓ Link	74.0%	4M	0.7				SReT-ExT	2021-11-09
GhostNet: More Features from Cheap Operations	✓ Link	73.9%	5.2M	0.141				GhostNet ×1.0	2019-11-27
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer	✓ Link	73.8%						DeiT-T (+MixPro)	2023-04-24
MobileNetV4 -- Universal Models for the Mobile Ecosystem	✓ Link	73.8%						MNv4-Conv-S	2024-04-16
Rethinking and Improving Relative Position Encoding for Vision Transformer	✓ Link	73.7%	6M	2.568				DeiT-Ti with iRPE-K	2021-07-29
Distilled Gradual Pruning with Pruned Fine-tuning	✓ Link	73.66%	2.56M	0.4				DGPPF-ResNet50	2024-02-15
TransBoost: Improving the Best ImageNet Performance using Deep Transduction	✓ Link	73.36%	11.69M					TransBoost-ResNet18	2022-05-26
What's Hidden in a Randomly Weighted Neural Network?	✓ Link	73.3%	20.6M					Wide ResNet-50 (edge-popup)	2019-11-29
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks	✓ Link	73.19%						ResNet-18 (MEAL V2)	2020-09-17
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases	✓ Link	73.1%	6M	1				ConViT-Ti	2021-03-19
RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network	✓ Link	72.8%	3.42M	0.31				RevBiFPN-S0	2022-06-28
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	72.8%	7M	0.435				DY-MobileNetV2 ×0.75	2019-12-07
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	72.7%	42.7M	3.7				DY-ResNet-18	2019-12-07
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks	✓ Link	72.56%	3.34M	0.320				ECA-Net (MobileNetV2)	2019-10-08
Compact Global Descriptor for Neural Networks	✓ Link	72.56%	4.26M	1.198				MobileNet-224 (CGD)	2019-07-23
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	72.5%	2.1M	0.275				MobileOne-S0 (distill)	2022-06-08
LocalViT: Bringing Locality to Vision Transformers	✓ Link	72.5%	4.3M	1.2				LocalViT-T2T	2021-04-12
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features	✓ Link	72.33%	1.4M	0.481				MobileViTv3-0.5	2022-09-30
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup	✓ Link	72.33%	11.7M					ResNet-18 (SAMix)	2021-11-30
On the adequacy of untuned warmup for adaptive optimization	✓ Link	72.1%						ResNet-50	2019-10-09
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers	✓ Link	72.05%	11.7M					ResNet-18 (AutoMix)	2021-03-24
MobileNetV2: Inverted Residuals and Linear Bottlenecks	✓ Link	72%	3.4M	0.600				MobileNetV2	2018-01-13
QuantNet: Learning to Quantize by Learning within Fully Differentiable Framework		71.97%						Ours	2020-09-10
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	71.94	5.7M					SimpleNetV1-5m	2016-08-22
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	71.71%						ResNet-18 (PAD-L2 w/ ResNet-34 teacher)	2020-11-25
MUXConv: Information Multiplexing in Convolutional Neural Networks	✓ Link	71.6%	2.4M	0.234				MUXNet-s	2020-03-31
Augmenting Deep Classifiers with Polynomial Neural Networks	✓ Link	71.6%	11.51M					PDC	2021-04-16
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	71.56%						ResNet-18 (FT w/ ResNet-34 teacher)	2020-11-25
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers	✓ Link	71.53%						EfficientFormer-V2-S0	2023-08-18
MobileOne: An Improved One millisecond Mobile Backbone	✓ Link	71.4%	2.1M					MobileOne-S0	2022-06-08
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	71.37%						ResNet-18 (KD w/ ResNet-34 teacher)	2020-11-25
Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks		71.24						Dspike (VGG-16)	2021-12-01
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications	✓ Link	71.2%	1.3M	0.522				EdgeNeXt-XXS	2022-06-21
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	71.08%						ResNet-18 (L2 w/ ResNet-34 teacher)	2020-11-25
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features	✓ Link	70.98%	1.2M	0.289				MobileViTv3-XXS	2022-09-30
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	70.93%						ResNet-18 (CRD w/ ResNet-34 teacher)	2020-11-25
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices	✓ Link	70.9%						ShuffleNet	2017-07-04
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications	✓ Link	70.6%		1.138				MobileNet-224 ×1.25	2017-04-17
[]()		70.54						PSN (SEW ResNet-34)
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	70.52%						ResNet-18 (tf-KD w/ ResNet-18 teacher)	2020-11-25
PVT v2: Improved Baselines with Pyramid Vision Transformer	✓ Link	70.5%	3.4M	0.6				PVTv2-B0	2021-06-25
Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks	✓ Link	70.42						GAC-SNN MS-ResNet-34	2023-08-12
Separable Self-attention for Mobile Vision Transformers	✓ Link	70.2%	1.4M	0.5				MobileViTv2-0.5	2022-06-06
torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation	✓ Link	70.09%						ResNet-18 (SSKD w/ ResNet-34 teacher)	2020-11-25
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	69.7%	4.8M	0.137				DY-MobileNetV3-Small	2019-12-07
Scalable Vision Transformers with Hierarchical Pooling	✓ Link	69.64%	5.74M	0.64				HVT-Ti-1	2021-03-19
GhostNetV3: Exploring the Training Strategies for Compact Models	✓ Link	69.4%				88.5		GhostNetV3 0.5x	2024-04-17
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	69.4%	4M	0.203				DY-MobileNetV2 ×0.5	2019-12-07
AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks	✓ Link	69.2%	2.8M	0.1344				AsymmNet-Large ×0.5	2021-04-15
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	69.11	1.5M					SimpleNetV1-small-05-correct-labels	2016-08-22
Correlated Input-Dependent Label Noise in Large-Scale Image Classification		68.6%						Heteroscedastic (InceptionResNet-v2)	2021-05-19
AsymmNet: Towards ultralight convolution neural networks using asymmetrical bottlenecks	✓ Link	68.4%	3.1M	0.1154				AsymmNet-Small ×1.0	2021-04-15
FireCaffe: near-linear acceleration of deep neural network training on compute clusters		68.3%						FireCaffe (GoogLeNet)	2015-10-31
Graph-RISE: Graph-Regularized Image Semantic Embedding	✓ Link	68.29%						Graph-RISE (40M)	2019-02-14
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	68.15	3M					SimpleNetV1-small-075	2016-08-22
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization	✓ Link	68.0%						ReActNet-A (BN-Free)	2021-04-16
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective	✓ Link	67.74%						ResNet34 (FSGDM)	2024-11-29
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	67.7%	18.6M	1.82				DY-ResNet-10	2019-12-07
WaveMix-Lite: A Resource-efficient Neural Network for Image Analysis	✓ Link	67.7%	32.4M					WaveMixLite-256/24	2022-10-13
[]()		67.63						PSN (SEW ResNet-18)
MUXConv: Information Multiplexing in Convolutional Neural Networks	✓ Link	66.7%	1.8M	0.132				MUXNet-xs	2020-03-31
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples	✓ Link	66.5%						PAWS (ResNet-50, 1% labels)	2021-04-28
GhostNet: More Features from Cheap Operations	✓ Link	66.2%	2.6M	0.042				GhostNet ×0.5	2019-11-27
[]()		66.04				86.76		OverFeat
Distilled Gradual Pruning with Pruned Fine-tuning	✓ Link	65.59%	1.03M	0.1				DGPPF-MobileNetV2	2024-02-15
Distilled Gradual Pruning with Pruned Fine-tuning	✓ Link	65.22	1.15M	0.2				DGPPF-ResNet18	2024-02-15
Online Training Through Time for Spiking Neural Networks	✓ Link	65.15%						OTTT	2022-10-09
Dynamic Convolution: Attention over Convolution Kernels	✓ Link	64.9%	2.8M	0.124				DY-MobileNetV2 ×0.35	2019-12-07
[]()		63.3	62M			84.6		Alexnet
Balanced Binary Neural Networks with Gated Residual	✓ Link	62.6%						BBG (ResNet-34)	2019-09-26
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures	✓ Link	61.52	1.5M					SimpleNetV1-small-05	2016-08-22
Balanced Binary Neural Networks with Gated Residual	✓ Link	59.4%						BBG (ResNet-18)	2019-09-26
FireCaffe: near-linear acceleration of deep neural network training on compute clusters		58.9%						FireCaffe (AlexNet)	2015-10-31
0/1 Deep Neural Networks via Block Coordinate Descent		38.3%						HMAX	2022-06-19
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	✓ Link	24%						ViT-Large	2020-10-22
Escaping the Big Data Paradigm with Compact Transformers	✓ Link		22.36M	11.06				CCT-14/7x2	2021-04-12
MambaVision: A Hybrid Mamba-Transformer Vision Backbone	✓ Link		241.5M					MambaVision-L2	2024-07-10
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	✓ Link		1520M					ONE-PEACE	2023-05-18
Multimodal Autoregressive Pre-training of Large Vision Encoders	✓ Link		2700M					AIMv2-2B	2024-11-21
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link		3000M					InternImage-DCNv3-G (M3I Pre-training)	2022-11-10

OpenCodePapers

image-classification-on-imagenet