| Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 96.08 | | | | EffNet-L2 (SAM) | 2020-10-03 |
| ML-Decoder: Scalable and Versatile Classification Head | ✓ Link | 95.1 | | | | Swin-L + ML-Decoder | 2021-11-25 |
| An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems | ✓ Link | 94.95 | | | | µ2Net (ViT-L/16) | 2022-05-25 |
| ImageNet-21K Pretraining for the Masses | ✓ Link | 94.2 | | | | ViT-B-16 (ImageNet-21K-P pretrain) | 2021-04-22 |
| CvT: Introducing Convolutions to Vision Transformers | ✓ Link | 94.09 | | | | CvT-W24 | 2021-03-29 |
| Perturbated Gradients Updating within Unit Space for Deep Learning | ✓ Link | 93.95 | | | | ViT-B/16 (PUGD) | 2021-10-01 |
| An Algorithm for Routing Vectors in Sequences | ✓ Link | 93.8 | 309.8M | | | Heinsen Routing + BEiT-large 16 224 | 2022-11-20 |
| Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 93.51 | | | | BiT-L (ResNet) | 2019-12-24 |
| Reduction of Class Activation Uncertainty with Background Information | ✓ Link | 93.31 | | | | VIT-L/16 (Spinal FC, Background) | 2023-05-05 |
| Going deeper with Image Transformers | ✓ Link | 93.1 | | | | CaiT-M-36 U 224 | 2021-03-31 |
| Three things everyone should know about Vision Transformers | ✓ Link | 93.0 | | | | ViT-L (attn fine-tune) | 2022-03-18 |
| TResNet: High Performance GPU-Dedicated Architecture | ✓ Link | 92.6 | | | | TResNet-L-V2 | 2020-03-30 |
| EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 92.3 | | | | EfficientNetV2-L | 2021-04-01 |
| EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 92.2 | | | | EfficientNetV2-M | 2021-04-01 |
| Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 92.17 | | | | BiT-M (ResNet) | 2019-12-24 |
| Incorporating Convolution Designs into Visual Transformers | ✓ Link | 91.8 | | | | CeiT-S | 2021-03-22 |
| Incorporating Convolution Designs into Visual Transformers | ✓ Link | 91.8 | | | | CeiT-S (384 finetune resolution) | 2021-03-22 |
| EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks | ✓ Link | 91.7 | 64M | | | EfficientNet-B7 | 2019-05-28 |
| EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 91.5 | | | | EfficientNetV2-S | 2021-04-01 |
| GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | ✓ Link | 91.3 | | | | GPIPE | 2018-11-16 |
| Transformer in Transformer | ✓ Link | 91.1 | 65.6M | | | TNT-B | 2021-02-27 |
| Training data-efficient image transformers & distillation through attention | ✓ Link | 90.8 | 86M | | | DeiT-B | 2020-12-23 |
| Global Filter Networks for Image Classification | ✓ Link | 90.3 | 54M | | | GFNet-H-B | 2021-07-01 |
| Rethinking Recurrent Neural Networks and Other Improvements for Image Classification | ✓ Link | 90.27 | | | | E2E-3M | 2020-07-30 |
| Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy | ✓ Link | 90.2 | | | | Bamboo (ViT-B/16) | 2022-03-15 |
| ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks | ✓ Link | 89.90 | | | | PyramidNet-272 (ASAM) | 2021-02-23 |
| Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 89.7 | | | | PyramidNet (SAM) | 2020-10-03 |
| Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition | ✓ Link | 89.63 | | | | DVT (T2T-ViT-24) | 2021-05-31 |
| ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 89.5 | | | | ResMLP-24 | 2021-05-07 |
| Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 89.46 | 32.8M | | | PyramidNet-272, S=4 | 2020-11-30 |
| Incorporating Convolution Designs into Visual Transformers | ✓ Link | 89.4 | | | | CeiT-T | 2021-03-22 |
| AutoAugment: Learning Augmentation Policies from Data | ✓ Link | 89.3 | | | | PyramidNet+ShakeDrop | 2018-05-24 |
| When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 89.1 | | | | ViT-B/16- SAM | 2021-06-03 |
| ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 89.1 | | | | ConvMLP-M | 2021-09-09 |
| ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 88.6 | | | | ConvMLP-L | 2021-09-09 |
| Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images | ✓ Link | 88.54 | | | | ResNet-152x4-AGC (ImageNet-21K) | 2021-05-31 |
| ColorNet: Investigating the importance of color spaces for image classification | ✓ Link | 88.4 | 19.0M | | | ColorNet | 2019-02-01 |
| Fast AutoAugment | ✓ Link | 88.3 | | | | PyramidNet+ShakeDrop (Fast AA) | 2019-05-01 |
| Neural Architecture Transfer | ✓ Link | 88.3 | 9.0M | | | NAT-M4 | 2020-05-12 |
| Incorporating Convolution Designs into Visual Transformers | ✓ Link | 88 | | | | CeiT-T (384 finetune resolution) | 2021-03-22 |
| Neural Architecture Transfer | ✓ Link | 87.7 | 7.8M | | | NAT-M3 | 2020-05-12 |
| When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 87.6 | | | | ViT-S/16- SAM | 2021-06-03 |
| Neural Architecture Transfer | ✓ Link | 87.5 | 6.4M | | | NAT-M2 | 2020-05-12 |
| PSO-Convolutional Neural Networks with Heterogeneous Learning Rate | ✓ Link | 87.48 | | | | Dynamics 1 | 2022-05-20 |
| Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 87.44 | 26.3M | | | DenseNet-BC-190, S=4 | 2020-11-30 |
| ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 87.4 | | | | ConvMLP-S | 2021-09-09 |
| ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 87.0 | | | | ResMLP-12 | 2021-05-07 |
| Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 86.90 | | | | WRN-40-10, S=4 | 2020-11-30 |
| ResNet strikes back: An improved training procedure in timm | ✓ Link | 86.9 | 25M | | | ResNet50 (A1) | 2021-10-01 |
| MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks | ✓ Link | 86.81 | | | | WRN-28-10 * 3 | 2021-03-10 |
| Regularizing Neural Networks via Adversarial Model Perturbation | ✓ Link | 86.64 | | | | PyramidNet + AA (AMP) | 2020-10-10 |
| Self-Knowledge Distillation with Progressive Refinement of Targets | ✓ Link | 86.41 | | | | PyramidNet-200 + Shakedrop + Cutmix + PS-KD | 2020-06-22 |
| When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 86.4 | | | | Mixer-B/16- SAM | 2021-06-03 |
| Deep Feature Response Discriminative Calibration | ✓ Link | 86.31 | | | | ResCNet-50 | 2024-11-16 |
| CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features | ✓ Link | 86.19 | | | | PyramidNet-200 + Shakedrop + Cutmix | 2019-05-13 |
| MUXConv: Information Multiplexing in Convolutional Neural Networks | ✓ Link | 86.1 | 2.1M | | | MUXNet-m | 2020-03-31 |
| Neural Architecture Transfer | ✓ Link | 86.0 | 3.8M | | | NAT-M1 | 2020-05-12 |
| MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks | ✓ Link | 85.77 | | | | WRN-28-10 | 2021-03-10 |
| Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 85.74 | | | | WRN-28-10, S=4 | 2020-11-30 |
| []() | | 85.59 | | | | WRN-28-8 (SAMix+DM) | |
| Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup | ✓ Link | 85.50 | | | | WRN-28-8 +SAMix | 2021-11-30 |
| Improving Neural Architecture Search Image Classifiers via Ensemble Learning | ✓ Link | 85.42 | | | | ASANas | 2019-03-14 |
| []() | | 85.38 | | | | WRN-28-8 (AutoMix+DM) | |
| SparseSwin: Swin Transformer with Sparse Transformer Block | ✓ Link | 85.35 | 17.58M | | | SparseSwin | 2023-09-11 |
| []() | | 85.25 | | | | WRN-28-8 (PuzzleMix+DM) | |
| When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 85.2 | | | | ResNet-50-SAM | 2021-06-03 |
| AutoMix: Unveiling the Power of Mixup for Stronger Classifiers | ✓ Link | 85.16 | | | | WRN-28-8 +AutoMix | 2021-03-24 |
| WaveMix: A Resource-efficient Neural Network for Image Analysis | ✓ Link | 85.09 | | | | WaveMixLite-256/7 | 2022-05-28 |
| Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics | ✓ Link | 85.08 | | | | MANO-tiny | 2025-07-03 |
| Neural networks with late-phase weights | ✓ Link | 85.00 | | | | WRN 28-14 | 2020-07-25 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 85 | | | | R-Mix (WideResNet 28-10) | 2022-12-09 |
| EEEA-Net: An Early Exit Evolutionary Neural Architecture Search | ✓ Link | 84.98 | | | | EEEA-Net-C (b=5)+ CO | 2021-08-13 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 84.9 | | | | RL-Mix (WideResNet 28-10) | 2022-12-09 |
| Automatic Data Augmentation via Invariance-Constrained Learning | ✓ Link | 84.89 | | | | Wide-ResNet-28-10 | 2022-09-29 |
| Squeeze-and-Excitation Networks | ✓ Link | 84.59 | | | | SENet + ShakeEven + Cutout | 2017-09-05 |
| Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup | ✓ Link | 84.42 | | | | ResNeXt-50(32x4d) + SAMix | 2021-11-30 |
| Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 84.38 | | | | WRN-28-10 with reSGHMC | 2020-08-12 |
| Averaging Weights Leads to Wider Optima and Better Generalization | ✓ Link | 84.16 | | | | PyramidNet-272 + SWA | 2018-03-14 |
| Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup | ✓ Link | 84.05 | | | | WRN28-10 | 2020-09-15 |
| Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 84.04 | 11.4M | | | HCGNet-A3 | 2019-08-26 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 83.97 | | | | WideResNet 28-10 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
| FMix: Enhancing Mixed Sample Data Augmentation | ✓ Link | 83.95 | | | | DenseNet-BC-190 + FMix | 2020-02-27 |
| Oriented Response Networks | ✓ Link | 83.85 | | | | ORN | 2017-01-07 |
| Grafit: Learning fine-grained image representations with coarse labels | | 83.7 | | | | Grafit (ResNet-50) | 2020-11-25 |
| AutoMix: Unveiling the Power of Mixup for Stronger Classifiers | ✓ Link | 83.64 | | | | ResNeXt-50(32x4d) + AutoMix | 2021-03-24 |
| TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers | ✓ Link | 83.57 | | | | CCT-7/3x1+HTM+VTM | 2022-10-14 |
| Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 83.46 | 3.1M | | | HCGNet-A2 | 2019-08-26 |
| Res2Net: A New Multi-scale Backbone Architecture | ✓ Link | 83.44 | | | | Res2NeXt-29 | 2019-04-02 |
| mixup: Beyond Empirical Risk Minimization | ✓ Link | 83.20 | | | | DenseNet-BC-190 + Mixup | 2017-10-25 |
| Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks | ✓ Link | 83.2 | | | | SSAL-DenseNet 190-40 | 2021-01-07 |
| EnAET: A Self-Trained framework for Semi-Supervised and Supervised Learning with Ensemble Transformations | ✓ Link | 83.13 | | | | EnAET | 2019-11-21 |
| Neural networks with late-phase weights | ✓ Link | 83.06 | | | | WRN 28-10 | 2020-07-25 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 83.02 | | | | R-Mix (ResNeXt 29-4-24) | 2022-12-09 |
| Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems | ✓ Link | 82.95 | | | | Wide ResNet+Cutout+no BN scale/offset learning | 2019-07-16 |
| Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 82.95 | | | | WRN-16-8 with reSGHMC | 2020-08-12 |
| Densely Connected Convolutional Networks | ✓ Link | 82.82 | | | | DenseNet-BC | 2016-08-25 |
| ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 82.784 | | | | ABNet-2G-R3-Combined | 2024-11-28 |
| Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 82.72 | | | | CCT-7/3x1* | 2021-04-12 |
| EXACT: How to Train Your Accuracy | ✓ Link | 82.68 | | | | EXACT (WRN-28-10) | 2022-05-19 |
| Selective Kernel Networks | ✓ Link | 82.67 | | | | SKNet-29 (ResNeXt-29, 16×32d) | 2019-03-15 |
| Densely Connected Convolutional Networks | ✓ Link | 82.62 | | | | DenseNet | 2016-08-25 |
| Learning Implicitly Recurrent CNNs Through Parameter Sharing | ✓ Link | 82.57 | | | | Shared WRN | 2019-02-26 |
| Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding | ✓ Link | 82.56 | | | | Transformer local-attention (NesT-B) | 2021-05-26 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.43 | | | | RL-Mix (ResNeXt 29-4-24) | 2022-12-09 |
| When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 82.4 | | | | Mixer-S/16- SAM | 2021-06-03 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.32 | | | | R-Mix (WideResNet 16-8) | 2022-12-09 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.3 | | | | ResNeXt 29-4-24 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
| Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery | ✓ Link | 82.18 | | | | WARN | 2018-07-19 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.16 | | | | RL-Mix (WideResNet 16-8) | 2022-12-09 |
| Averaging Weights Leads to Wider Optima and Better Generalization | ✓ Link | 82.15 | | | | WRN+SWA | 2018-03-14 |
| Manifold Mixup: Better Representations by Interpolating Hidden States | ✓ Link | 81.96 | | | | Manifold Mixup | 2018-06-13 |
| Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 81.87 | 1.1M | | | HCGNet-A1 | 2019-08-26 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 81.79 | | | | WideResNet 16-8 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
| Learning Identity Mappings with Residual Gates | | 81.73 | | | | Residual Gates + WRN | 2016-11-04 |
| Revisiting a kNN-based Image Classification System with High-capacity Storage | | 81.7 | | | | kNN-CLIP | 2022-04-03 |
| Attention Augmented Convolutional Networks | ✓ Link | 81.6 | | | | AA-Wide-ResNet | 2019-04-22 |
| PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 81.6 | | | | PDO-eConv (p8, 4.6M) | 2020-07-20 |
| Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 81.53 | | | | SEER (RegNet10B) | 2022-02-16 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 81.49 | | | | R-Mix (PreActResNet-18) | 2022-12-09 |
| On the Performance Analysis of Momentum Method: A Frequency Domain Perspective | ✓ Link | 81.44 | | | | ResNet50 (FSGDM) | 2024-11-29 |
| Automatic Data Augmentation via Invariance-Constrained Learning | ✓ Link | 81.19 | | | | Wide-ResNet-40-2 | 2022-09-29 |
| Wide Residual Networks | ✓ Link | 81.15 | | | | Wide ResNet | 2016-05-23 |
| Deep Competitive Pathway Networks | ✓ Link | 81.10 | | | | CoPaNet-R-164 | 2017-09-29 |
| ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 80.830 | | | | ABNet-2G-R3 | 2024-11-28 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 80.75 | | | | RL-Mix (PreActResNet-18) | 2022-12-09 |
| Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 80.6 | | | | PreActResNet-18 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
| Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks | ✓ Link | 80.45 | | | | GAC-SNN | 2023-08-12 |
| ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 80.354 | | | | ABNet-2G-R2 | 2024-11-28 |
| Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet | ✓ Link | 80.29 | | | | SimpleNetv2 | 2018-02-17 |
| UPANets: Learning from the Universal Pixel Attention Networks | ✓ Link | 80.29 | | | | UPANets | 2021-03-15 |
| SageMix: Saliency-Guided Mixup for Point Clouds | ✓ Link | 80.16 | | | | PreActResNet-18 + SageMix | 2022-10-13 |
| Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 80.14 | | | | ResNet56 with reSGHMC | 2020-08-12 |
| PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 79.99 | | | | PDO-eConv (p8, 2.62M) | 2020-07-20 |
| Training Neural Networks with Local Error Signals | ✓ Link | 79.9 | | | | VGG11B(3x) + LocalLearning | 2019-01-20 |
| With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations | ✓ Link | 79 | | | | NNCLR | 2021-04-29 |
| ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 78.792 | | | | ABNet-2G-R1 | 2024-11-28 |
| Regularizing Neural Networks via Adversarial Model Perturbation | ✓ Link | 78.49 | | | | PreActResNet18 (AMP) | 2020-10-10 |
| Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures | ✓ Link | 78.37 | | | | SimpleNetv1 | 2016-08-22 |
| Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images | | 78.27 | 3.64M | | | ViT (lightweight, MAE pre-trained) | 2024-02-06 |
| Augmenting Deep Classifiers with Polynomial Neural Networks | ✓ Link | 77.9 | | | | PDC | 2021-04-16 |
| Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets | ✓ Link | 77.7 | | | | MobileNetV3-large x1.0 (BSConv-U) | 2020-03-30 |
| Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 77.31 | 3.17M | | | CCT-6/3x1 | 2021-04-12 |
| Identity Mappings in Deep Residual Networks | ✓ Link | 77.3 | | | | ResNet-1001 | 2016-03-16 |
| Large-Scale Evolution of Image Classifiers | ✓ Link | 77 | | | | Evolution | 2017-03-03 |
| DIANet: Dense-and-Implicit Attention Network | ✓ Link | 76.98 | | | | DIANet | 2019-05-25 |
| Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification | ✓ Link | 76.85 | | | | LP-BNN (ours) + cutout | 2020-12-04 |
| Learning Class Unique Features in Fine-Grained Visual Classification | | 76.64 | | | | ResNet-18+MM+FRL | 2020-11-22 |
| Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 76.55 | | | | ResNet32 with reSGHMC | 2020-08-12 |
| Momentum Residual Neural Networks | ✓ Link | 76.38 | | | | MomentumNet | 2021-02-15 |
| Spatially-sparse convolutional neural networks | ✓ Link | 75.7 | | | | SSCNN | 2014-09-22 |
| Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) | ✓ Link | 75.7 | | | | Exponential Linear Units | 2015-11-23 |
| CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters | ✓ Link | 75.59 | | | | ResNet-9 | 2022-03-29 |
| Deep Networks with Stochastic Depth | ✓ Link | 75.42 | | | | Stochastic Depth | 2016-03-30 |
| Mish: A Self Regularized Non-Monotonic Activation Function | ✓ Link | 74.41 | | | | ResNet v2-110 (Mish activation) | 2019-08-23 |
| Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks | | 74.24 | | | | Dspike (ResNet-18) | 2021-12-01 |
| Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 74.14 | | | | ResNet20 with reSGHMC | 2020-08-12 |
| MixMatch: A Holistic Approach to Semi-Supervised Learning | ✓ Link | 74.1 | | | | MixMatch | 2019-05-06 |
| Beta-Rank: A Robust Convolutional Filter Pruning Method For Imbalanced Medical Image Analysis | ✓ Link | 74.01 | | 74.01 | | Beta-Rank | 2023-04-15 |
| How to Use Dropout Correctly on Residual Networks with Batch Normalization | ✓ Link | 73.98 | | | | PreResNet-110 | 2023-02-13 |
| ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 73.930 | | | | ABNet-2G-R0 | 2024-11-28 |
| Fractional Max-Pooling | ✓ Link | 73.6 | | | | Fractional MP | 2014-12-18 |
| Deep Residual Networks with Exponential Linear Unit | ✓ Link | 73.5 | | | | ResNet+ELU | 2016-04-14 |
| PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 73 | | | | PDO-eConv (p6m,0.37M) | 2020-07-20 |
| Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods | ✓ Link | 72.96 | 4,252,298 | | | SOPCNN | 2020-01-24 |
| PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 72.87 | | | | PDO-eConv (p6,0.36M) | 2020-07-20 |
| Scalable Bayesian Optimization Using Deep Neural Networks | ✓ Link | 72.6 | | | | Tuned CNN | 2015-02-19 |
| Stochastic Subsampling With Average Pooling | | 72.537 | | | | ResNet-110 (SAP) | 2024-09-25 |
| Competitive Multi-scale Convolution | | 72.4 | | | | CMsC | 2015-11-18 |
| All you need is a good init | ✓ Link | 72.3 | | | | Fitnet4-LSUV | 2015-11-19 |
| How transfer learning is used in generative models for image classification: improved accuracy | ✓ Link | 71.52 | | | | GAN+ResNet | 2024-12-09 |
| Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks | ✓ Link | 71.36 | 0.52M | | | kMobileNet V3 Large 16ch | 2022-06-30 |
| Batch-normalized Maxout Network in Network | ✓ Link | 71.1 | | | | BNM NiN | 2015-11-09 |
| Online Training Through Time for Spiking Neural Networks | ✓ Link | 71.05 | | | | OTTT | 2022-10-09 |
| On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units | | 70.8 | | | | MIM | 2015-08-03 |
| WaveMix: A Resource-efficient Neural Network for Image Analysis | ✓ Link | 70.20 | | | | WaveMix-Lite-256/7 | 2022-05-28 |
| IM-Loss: Information Maximization Loss for Spiking Neural Networks | | 70.18 | | | | IM-Loss (VGG-16) | 2022-10-31 |
| Learning Activation Functions to Improve Deep Neural Networks | ✓ Link | 69.2 | | | | NiN+APL | 2014-12-21 |
| Stacked What-Where Auto-encoders | ✓ Link | 69.1 | | | | SWWAE | 2015-06-08 |
| Deep Convolutional Decision Jungle for Image Classification | | 69 | | | | NiN+Superclass+CDJ | 2017-06-06 |
| Spectral Representations for Convolutional Neural Networks | | 68.4 | | | | Spectral Representations for Convolutional Neural Networks | 2015-06-11 |
| "BNN - BN = ?": Training Binary Neural Networks without Batch Normalization | ✓ Link | 68.34 | | | | ReActNet-18 | 2021-04-16 |
| Training Very Deep Networks | ✓ Link | 67.8 | | | | VDN | 2015-07-22 |
| Deep Convolutional Neural Networks as Generic Feature Extractors | | 67.7 | | | | DCNN+GFE | 2017-10-06 |
| Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree | ✓ Link | 67.6 | | | | Tree+Max-Avg pooling | 2015-09-30 |
| HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition | ✓ Link | 67.4 | | | | HD-CNN | 2014-10-03 |
| Universum Prescription: Regularization using Unlabeled Data | | 67.2 | | | | Universum Prescription | 2015-11-11 |
| ResNet50_on_Cifar_100_Without_Transfer_Learning | ✓ Link | 67.060 | | | | ResNet50 Without Transfer Learning | 2020-08-03 |
| Learning the Connections in Direct Feedback Alignment | | 66.78 | | | | AlexNet (KP) | 2021-01-01 |
| Striving for Simplicity: The All Convolutional Net | ✓ Link | 66.3 | | | | ACN | 2014-12-21 |
| DLME: Deep Local-flatness Manifold Embedding | ✓ Link | 66.1 | | | | DLME (ResNet-18, linear) | 2022-07-07 |
| FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks | ✓ Link | 66 | | | | ResNet-18 (modified) | 2022-10-30 |
| Deeply-Supervised Nets | ✓ Link | 65.4 | | | | DSN | 2014-09-18 |
| Network In Network | ✓ Link | 64.3 | | | | NiN | 2013-12-16 |
| Discriminative Transfer Learning with Tree-based Priors | | 63.2 | | | | Tree Priors | 2013-12-01 |
| Improving Deep Neural Networks with Probabilistic Maxout Units | | 61.9 | | | | DNN+Probabilistic Maxout | 2013-12-20 |
| Maxout Networks | ✓ Link | 61.43 | | | | Maxout Network (k=2) | 2013-02-18 |
| Unsharp Masking Layer: Injecting Prior Knowledge in Convolutional Networks for Image Classification | ✓ Link | 60.36 | | | | ResNet20+UnsharpMaskLayer | 2019-09-29 |
| Convolutional Xformers for Vision | ✓ Link | 60.11 | | | | Convolutional Linear Transformer for Vision (CLTV) | 2022-01-25 |
| FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks | ✓ Link | 60 | | | | FatNet of ResNet-18 | 2022-10-30 |
| FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks | ✓ Link | 60 | | | | Optical Simulation of FatNet | 2022-10-30 |
| Empirical Evaluation of Rectified Activations in Convolutional Network | ✓ Link | 59.8 | | | | RReLU | 2015-05-05 |
| Stochastic Pooling for Regularization of Deep Convolutional Neural Networks | ✓ Link | 57.5 | | | | Stochastic Pooling | 2013-01-16 |
| How Important is Weight Symmetry in Backpropagation? | ✓ Link | 48.75 | | | | Sign-symmetry | 2015-10-17 |
| Learning the Connections in Direct Feedback Alignment | | 48.03 | | | | AlexNet (DFA) | 2021-01-01 |
| Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 42.64 | | | | CNN39 | 2020-10-03 |
| Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 36.07 | | | | CNN36 | 2020-10-03 |
| Sharpness-aware Quantization for Deep Neural Networks | ✓ Link | 35.05 | | | | CNN37 | 2021-11-24 |
| Learning the Connections in Direct Feedback Alignment | | 19.49 | | | | AlexNet (FA) | 2021-01-01 |
| Efficient Adaptive Ensembling for Image Classification | | | | 96.808 | | efficient adaptive ensembling | 2022-06-15 |
| Label Ranker: Self-Aware Preference for Classification Label Position in Visual Masked Self-Supervised Pre-Trained Model | ✓ Link | | | 90.82% | | Label-Ranker | 2025-03-03 |
| Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces | ✓ Link | | | | 91.2 | DGMMC-S | 2024-10-17 |