Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 96.08 | | | | EffNet-L2 (SAM) | 2020-10-03 |
ML-Decoder: Scalable and Versatile Classification Head | ✓ Link | 95.1 | | | | Swin-L + ML-Decoder | 2021-11-25 |
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems | ✓ Link | 94.95 | | | | µ2Net (ViT-L/16) | 2022-05-25 |
ImageNet-21K Pretraining for the Masses | ✓ Link | 94.2 | | | | ViT-B-16 (ImageNet-21K-P pretrain) | 2021-04-22 |
CvT: Introducing Convolutions to Vision Transformers | ✓ Link | 94.09 | | | | CvT-W24 | 2021-03-29 |
Perturbated Gradients Updating within Unit Space for Deep Learning | ✓ Link | 93.95 | | | | ViT-B/16 (PUGD) | 2021-10-01 |
An Algorithm for Routing Vectors in Sequences | ✓ Link | 93.8 | 309.8M | | | Heinsen Routing + BEiT-large 16 224 | 2022-11-20 |
Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 93.51 | | | | BiT-L (ResNet) | 2019-12-24 |
Reduction of Class Activation Uncertainty with Background Information | ✓ Link | 93.31 | | | | VIT-L/16 (Spinal FC, Background) | 2023-05-05 |
Going deeper with Image Transformers | ✓ Link | 93.1 | | | | CaiT-M-36 U 224 | 2021-03-31 |
Three things everyone should know about Vision Transformers | ✓ Link | 93.0 | | | | ViT-L (attn fine-tune) | 2022-03-18 |
TResNet: High Performance GPU-Dedicated Architecture | ✓ Link | 92.6 | | | | TResNet-L-V2 | 2020-03-30 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 92.3 | | | | EfficientNetV2-L | 2021-04-01 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 92.2 | | | | EfficientNetV2-M | 2021-04-01 |
Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 92.17 | | | | BiT-M (ResNet) | 2019-12-24 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 91.8 | | | | CeiT-S | 2021-03-22 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 91.8 | | | | CeiT-S (384 finetune resolution) | 2021-03-22 |
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks | ✓ Link | 91.7 | 64M | | | EfficientNet-B7 | 2019-05-28 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 91.5 | | | | EfficientNetV2-S | 2021-04-01 |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | ✓ Link | 91.3 | | | | GPIPE | 2018-11-16 |
Transformer in Transformer | ✓ Link | 91.1 | 65.6M | | | TNT-B | 2021-02-27 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 90.8 | 86M | | | DeiT-B | 2020-12-23 |
Global Filter Networks for Image Classification | ✓ Link | 90.3 | 54M | | | GFNet-H-B | 2021-07-01 |
Rethinking Recurrent Neural Networks and Other Improvements for Image Classification | ✓ Link | 90.27 | | | | E2E-3M | 2020-07-30 |
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy | ✓ Link | 90.2 | | | | Bamboo (ViT-B/16) | 2022-03-15 |
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks | ✓ Link | 89.90 | | | | PyramidNet-272 (ASAM) | 2021-02-23 |
Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 89.7 | | | | PyramidNet (SAM) | 2020-10-03 |
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition | ✓ Link | 89.63 | | | | DVT (T2T-ViT-24) | 2021-05-31 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 89.5 | | | | ResMLP-24 | 2021-05-07 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 89.46 | 32.8M | | | PyramidNet-272, S=4 | 2020-11-30 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 89.4 | | | | CeiT-T | 2021-03-22 |
AutoAugment: Learning Augmentation Policies from Data | ✓ Link | 89.3 | | | | PyramidNet+ShakeDrop | 2018-05-24 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 89.1 | | | | ViT-B/16- SAM | 2021-06-03 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 89.1 | | | | ConvMLP-M | 2021-09-09 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 88.6 | | | | ConvMLP-L | 2021-09-09 |
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images | ✓ Link | 88.54 | | | | ResNet-152x4-AGC (ImageNet-21K) | 2021-05-31 |
ColorNet: Investigating the importance of color spaces for image classification | ✓ Link | 88.4 | 19.0M | | | ColorNet | 2019-02-01 |
Fast AutoAugment | ✓ Link | 88.3 | | | | PyramidNet+ShakeDrop (Fast AA) | 2019-05-01 |
Neural Architecture Transfer | ✓ Link | 88.3 | 9.0M | | | NAT-M4 | 2020-05-12 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 88 | | | | CeiT-T (384 finetune resolution) | 2021-03-22 |
Neural Architecture Transfer | ✓ Link | 87.7 | 7.8M | | | NAT-M3 | 2020-05-12 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 87.6 | | | | ViT-S/16- SAM | 2021-06-03 |
Neural Architecture Transfer | ✓ Link | 87.5 | 6.4M | | | NAT-M2 | 2020-05-12 |
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate | ✓ Link | 87.48 | | | | Dynamics 1 | 2022-05-20 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 87.44 | 26.3M | | | DenseNet-BC-190, S=4 | 2020-11-30 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 87.4 | | | | ConvMLP-S | 2021-09-09 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 87.0 | | | | ResMLP-12 | 2021-05-07 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 86.90 | | | | WRN-40-10, S=4 | 2020-11-30 |
ResNet strikes back: An improved training procedure in timm | ✓ Link | 86.9 | 25M | | | ResNet50 (A1) | 2021-10-01 |
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks | ✓ Link | 86.81 | | | | WRN-28-10 * 3 | 2021-03-10 |
Regularizing Neural Networks via Adversarial Model Perturbation | ✓ Link | 86.64 | | | | PyramidNet + AA (AMP) | 2020-10-10 |
Self-Knowledge Distillation with Progressive Refinement of Targets | ✓ Link | 86.41 | | | | PyramidNet-200 + Shakedrop + Cutmix + PS-KD | 2020-06-22 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 86.4 | | | | Mixer-B/16- SAM | 2021-06-03 |
Deep Feature Response Discriminative Calibration | ✓ Link | 86.31 | | | | ResCNet-50 | 2024-11-16 |
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features | ✓ Link | 86.19 | | | | PyramidNet-200 + Shakedrop + Cutmix | 2019-05-13 |
MUXConv: Information Multiplexing in Convolutional Neural Networks | ✓ Link | 86.1 | 2.1M | | | MUXNet-m | 2020-03-31 |
Neural Architecture Transfer | ✓ Link | 86.0 | 3.8M | | | NAT-M1 | 2020-05-12 |
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks | ✓ Link | 85.77 | | | | WRN-28-10 | 2021-03-10 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 85.74 | | | | WRN-28-10, S=4 | 2020-11-30 |
[]() | | 85.59 | | | | WRN-28-8 (SAMix+DM) | |
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup | ✓ Link | 85.50 | | | | WRN-28-8 +SAMix | 2021-11-30 |
Improving Neural Architecture Search Image Classifiers via Ensemble Learning | ✓ Link | 85.42 | | | | ASANas | 2019-03-14 |
[]() | | 85.38 | | | | WRN-28-8 (AutoMix+DM) | |
SparseSwin: Swin Transformer with Sparse Transformer Block | ✓ Link | 85.35 | 17.58M | | | SparseSwin | 2023-09-11 |
[]() | | 85.25 | | | | WRN-28-8 (PuzzleMix+DM) | |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 85.2 | | | | ResNet-50-SAM | 2021-06-03 |
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers | ✓ Link | 85.16 | | | | WRN-28-8 +AutoMix | 2021-03-24 |
WaveMix: A Resource-efficient Neural Network for Image Analysis | ✓ Link | 85.09 | | | | WaveMixLite-256/7 | 2022-05-28 |
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics | ✓ Link | 85.08 | | | | MANO-tiny | 2025-07-03 |
Neural networks with late-phase weights | ✓ Link | 85.00 | | | | WRN 28-14 | 2020-07-25 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 85 | | | | R-Mix (WideResNet 28-10) | 2022-12-09 |
EEEA-Net: An Early Exit Evolutionary Neural Architecture Search | ✓ Link | 84.98 | | | | EEEA-Net-C (b=5)+ CO | 2021-08-13 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 84.9 | | | | RL-Mix (WideResNet 28-10) | 2022-12-09 |
Automatic Data Augmentation via Invariance-Constrained Learning | ✓ Link | 84.89 | | | | Wide-ResNet-28-10 | 2022-09-29 |
Squeeze-and-Excitation Networks | ✓ Link | 84.59 | | | | SENet + ShakeEven + Cutout | 2017-09-05 |
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup | ✓ Link | 84.42 | | | | ResNeXt-50(32x4d) + SAMix | 2021-11-30 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 84.38 | | | | WRN-28-10 with reSGHMC | 2020-08-12 |
Averaging Weights Leads to Wider Optima and Better Generalization | ✓ Link | 84.16 | | | | PyramidNet-272 + SWA | 2018-03-14 |
Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup | ✓ Link | 84.05 | | | | WRN28-10 | 2020-09-15 |
Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 84.04 | 11.4M | | | HCGNet-A3 | 2019-08-26 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 83.97 | | | | WideResNet 28-10 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
FMix: Enhancing Mixed Sample Data Augmentation | ✓ Link | 83.95 | | | | DenseNet-BC-190 + FMix | 2020-02-27 |
Oriented Response Networks | ✓ Link | 83.85 | | | | ORN | 2017-01-07 |
Grafit: Learning fine-grained image representations with coarse labels | | 83.7 | | | | Grafit (ResNet-50) | 2020-11-25 |
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers | ✓ Link | 83.64 | | | | ResNeXt-50(32x4d) + AutoMix | 2021-03-24 |
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers | ✓ Link | 83.57 | | | | CCT-7/3x1+HTM+VTM | 2022-10-14 |
Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 83.46 | 3.1M | | | HCGNet-A2 | 2019-08-26 |
Res2Net: A New Multi-scale Backbone Architecture | ✓ Link | 83.44 | | | | Res2NeXt-29 | 2019-04-02 |
mixup: Beyond Empirical Risk Minimization | ✓ Link | 83.20 | | | | DenseNet-BC-190 + Mixup | 2017-10-25 |
Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks | ✓ Link | 83.2 | | | | SSAL-DenseNet 190-40 | 2021-01-07 |
EnAET: A Self-Trained framework for Semi-Supervised and Supervised Learning with Ensemble Transformations | ✓ Link | 83.13 | | | | EnAET | 2019-11-21 |
Neural networks with late-phase weights | ✓ Link | 83.06 | | | | WRN 28-10 | 2020-07-25 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 83.02 | | | | R-Mix (ResNeXt 29-4-24) | 2022-12-09 |
Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems | ✓ Link | 82.95 | | | | Wide ResNet+Cutout+no BN scale/offset learning | 2019-07-16 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 82.95 | | | | WRN-16-8 with reSGHMC | 2020-08-12 |
Densely Connected Convolutional Networks | ✓ Link | 82.82 | | | | DenseNet-BC | 2016-08-25 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 82.784 | | | | ABNet-2G-R3-Combined | 2024-11-28 |
Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 82.72 | | | | CCT-7/3x1* | 2021-04-12 |
EXACT: How to Train Your Accuracy | ✓ Link | 82.68 | | | | EXACT (WRN-28-10) | 2022-05-19 |
Selective Kernel Networks | ✓ Link | 82.67 | | | | SKNet-29 (ResNeXt-29, 16×32d) | 2019-03-15 |
Densely Connected Convolutional Networks | ✓ Link | 82.62 | | | | DenseNet | 2016-08-25 |
Learning Implicitly Recurrent CNNs Through Parameter Sharing | ✓ Link | 82.57 | | | | Shared WRN | 2019-02-26 |
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding | ✓ Link | 82.56 | | | | Transformer local-attention (NesT-B) | 2021-05-26 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.43 | | | | RL-Mix (ResNeXt 29-4-24) | 2022-12-09 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 82.4 | | | | Mixer-S/16- SAM | 2021-06-03 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.32 | | | | R-Mix (WideResNet 16-8) | 2022-12-09 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.3 | | | | ResNeXt 29-4-24 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery | ✓ Link | 82.18 | | | | WARN | 2018-07-19 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 82.16 | | | | RL-Mix (WideResNet 16-8) | 2022-12-09 |
Averaging Weights Leads to Wider Optima and Better Generalization | ✓ Link | 82.15 | | | | WRN+SWA | 2018-03-14 |
Manifold Mixup: Better Representations by Interpolating Hidden States | ✓ Link | 81.96 | | | | Manifold Mixup | 2018-06-13 |
Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 81.87 | 1.1M | | | HCGNet-A1 | 2019-08-26 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 81.79 | | | | WideResNet 16-8 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
Learning Identity Mappings with Residual Gates | | 81.73 | | | | Residual Gates + WRN | 2016-11-04 |
Revisiting a kNN-based Image Classification System with High-capacity Storage | | 81.7 | | | | kNN-CLIP | 2022-04-03 |
Attention Augmented Convolutional Networks | ✓ Link | 81.6 | | | | AA-Wide-ResNet | 2019-04-22 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 81.6 | | | | PDO-eConv (p8, 4.6M) | 2020-07-20 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 81.53 | | | | SEER (RegNet10B) | 2022-02-16 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 81.49 | | | | R-Mix (PreActResNet-18) | 2022-12-09 |
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective | ✓ Link | 81.44 | | | | ResNet50 (FSGDM) | 2024-11-29 |
Automatic Data Augmentation via Invariance-Constrained Learning | ✓ Link | 81.19 | | | | Wide-ResNet-40-2 | 2022-09-29 |
Wide Residual Networks | ✓ Link | 81.15 | | | | Wide ResNet | 2016-05-23 |
Deep Competitive Pathway Networks | ✓ Link | 81.10 | | | | CoPaNet-R-164 | 2017-09-29 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 80.830 | | | | ABNet-2G-R3 | 2024-11-28 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 80.75 | | | | RL-Mix (PreActResNet-18) | 2022-12-09 |
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding | ✓ Link | 80.6 | | | | PreActResNet-18 + CutMix (OneCycleLR scheduler) | 2022-12-09 |
Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks | ✓ Link | 80.45 | | | | GAC-SNN | 2023-08-12 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 80.354 | | | | ABNet-2G-R2 | 2024-11-28 |
Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet | ✓ Link | 80.29 | | | | SimpleNetv2 | 2018-02-17 |
UPANets: Learning from the Universal Pixel Attention Networks | ✓ Link | 80.29 | | | | UPANets | 2021-03-15 |
SageMix: Saliency-Guided Mixup for Point Clouds | ✓ Link | 80.16 | | | | PreActResNet-18 + SageMix | 2022-10-13 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 80.14 | | | | ResNet56 with reSGHMC | 2020-08-12 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 79.99 | | | | PDO-eConv (p8, 2.62M) | 2020-07-20 |
Training Neural Networks with Local Error Signals | ✓ Link | 79.9 | | | | VGG11B(3x) + LocalLearning | 2019-01-20 |
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations | ✓ Link | 79 | | | | NNCLR | 2021-04-29 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 78.792 | | | | ABNet-2G-R1 | 2024-11-28 |
Regularizing Neural Networks via Adversarial Model Perturbation | ✓ Link | 78.49 | | | | PreActResNet18 (AMP) | 2020-10-10 |
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures | ✓ Link | 78.37 | | | | SimpleNetv1 | 2016-08-22 |
Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images | | 78.27 | 3.64M | | | ViT (lightweight, MAE pre-trained) | 2024-02-06 |
Augmenting Deep Classifiers with Polynomial Neural Networks | ✓ Link | 77.9 | | | | PDC | 2021-04-16 |
Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets | ✓ Link | 77.7 | | | | MobileNetV3-large x1.0 (BSConv-U) | 2020-03-30 |
Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 77.31 | 3.17M | | | CCT-6/3x1 | 2021-04-12 |
Identity Mappings in Deep Residual Networks | ✓ Link | 77.3 | | | | ResNet-1001 | 2016-03-16 |
Large-Scale Evolution of Image Classifiers | ✓ Link | 77 | | | | Evolution | 2017-03-03 |
DIANet: Dense-and-Implicit Attention Network | ✓ Link | 76.98 | | | | DIANet | 2019-05-25 |
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification | ✓ Link | 76.85 | | | | LP-BNN (ours) + cutout | 2020-12-04 |
Learning Class Unique Features in Fine-Grained Visual Classification | | 76.64 | | | | ResNet-18+MM+FRL | 2020-11-22 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 76.55 | | | | ResNet32 with reSGHMC | 2020-08-12 |
Momentum Residual Neural Networks | ✓ Link | 76.38 | | | | MomentumNet | 2021-02-15 |
Spatially-sparse convolutional neural networks | ✓ Link | 75.7 | | | | SSCNN | 2014-09-22 |
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) | ✓ Link | 75.7 | | | | Exponential Linear Units | 2015-11-23 |
CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters | ✓ Link | 75.59 | | | | ResNet-9 | 2022-03-29 |
Deep Networks with Stochastic Depth | ✓ Link | 75.42 | | | | Stochastic Depth | 2016-03-30 |
Mish: A Self Regularized Non-Monotonic Activation Function | ✓ Link | 74.41 | | | | ResNet v2-110 (Mish activation) | 2019-08-23 |
Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks | | 74.24 | | | | Dspike (ResNet-18) | 2021-12-01 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 74.14 | | | | ResNet20 with reSGHMC | 2020-08-12 |
MixMatch: A Holistic Approach to Semi-Supervised Learning | ✓ Link | 74.1 | | | | MixMatch | 2019-05-06 |
Beta-Rank: A Robust Convolutional Filter Pruning Method For Imbalanced Medical Image Analysis | ✓ Link | 74.01 | | 74.01 | | Beta-Rank | 2023-04-15 |
How to Use Dropout Correctly on Residual Networks with Batch Normalization | ✓ Link | 73.98 | | | | PreResNet-110 | 2023-02-13 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 73.930 | | | | ABNet-2G-R0 | 2024-11-28 |
Fractional Max-Pooling | ✓ Link | 73.6 | | | | Fractional MP | 2014-12-18 |
Deep Residual Networks with Exponential Linear Unit | ✓ Link | 73.5 | | | | ResNet+ELU | 2016-04-14 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 73 | | | | PDO-eConv (p6m,0.37M) | 2020-07-20 |
Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods | ✓ Link | 72.96 | 4,252,298 | | | SOPCNN | 2020-01-24 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 72.87 | | | | PDO-eConv (p6,0.36M) | 2020-07-20 |
Scalable Bayesian Optimization Using Deep Neural Networks | ✓ Link | 72.6 | | | | Tuned CNN | 2015-02-19 |
Stochastic Subsampling With Average Pooling | | 72.537 | | | | ResNet-110 (SAP) | 2024-09-25 |
Competitive Multi-scale Convolution | | 72.4 | | | | CMsC | 2015-11-18 |
All you need is a good init | ✓ Link | 72.3 | | | | Fitnet4-LSUV | 2015-11-19 |
How transfer learning is used in generative models for image classification: improved accuracy | ✓ Link | 71.52 | | | | GAN+ResNet | 2024-12-09 |
Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks | ✓ Link | 71.36 | 0.52M | | | kMobileNet V3 Large 16ch | 2022-06-30 |
Batch-normalized Maxout Network in Network | ✓ Link | 71.1 | | | | BNM NiN | 2015-11-09 |
Online Training Through Time for Spiking Neural Networks | ✓ Link | 71.05 | | | | OTTT | 2022-10-09 |
On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units | | 70.8 | | | | MIM | 2015-08-03 |
WaveMix: A Resource-efficient Neural Network for Image Analysis | ✓ Link | 70.20 | | | | WaveMix-Lite-256/7 | 2022-05-28 |
IM-Loss: Information Maximization Loss for Spiking Neural Networks | | 70.18 | | | | IM-Loss (VGG-16) | 2022-10-31 |
Learning Activation Functions to Improve Deep Neural Networks | ✓ Link | 69.2 | | | | NiN+APL | 2014-12-21 |
Stacked What-Where Auto-encoders | ✓ Link | 69.1 | | | | SWWAE | 2015-06-08 |
Deep Convolutional Decision Jungle for Image Classification | | 69 | | | | NiN+Superclass+CDJ | 2017-06-06 |
Spectral Representations for Convolutional Neural Networks | | 68.4 | | | | Spectral Representations for Convolutional Neural Networks | 2015-06-11 |
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization | ✓ Link | 68.34 | | | | ReActNet-18 | 2021-04-16 |
Training Very Deep Networks | ✓ Link | 67.8 | | | | VDN | 2015-07-22 |
Deep Convolutional Neural Networks as Generic Feature Extractors | | 67.7 | | | | DCNN+GFE | 2017-10-06 |
Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree | ✓ Link | 67.6 | | | | Tree+Max-Avg pooling | 2015-09-30 |
HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition | ✓ Link | 67.4 | | | | HD-CNN | 2014-10-03 |
Universum Prescription: Regularization using Unlabeled Data | | 67.2 | | | | Universum Prescription | 2015-11-11 |
ResNet50_on_Cifar_100_Without_Transfer_Learning | ✓ Link | 67.060 | | | | ResNet50 Without Transfer Learning | 2020-08-03 |
Learning the Connections in Direct Feedback Alignment | | 66.78 | | | | AlexNet (KP) | 2021-01-01 |
Striving for Simplicity: The All Convolutional Net | ✓ Link | 66.3 | | | | ACN | 2014-12-21 |
DLME: Deep Local-flatness Manifold Embedding | ✓ Link | 66.1 | | | | DLME (ResNet-18, linear) | 2022-07-07 |
FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks | ✓ Link | 66 | | | | ResNet-18 (modified) | 2022-10-30 |
Deeply-Supervised Nets | ✓ Link | 65.4 | | | | DSN | 2014-09-18 |
Network In Network | ✓ Link | 64.3 | | | | NiN | 2013-12-16 |
Discriminative Transfer Learning with Tree-based Priors | | 63.2 | | | | Tree Priors | 2013-12-01 |
Improving Deep Neural Networks with Probabilistic Maxout Units | | 61.9 | | | | DNN+Probabilistic Maxout | 2013-12-20 |
Maxout Networks | ✓ Link | 61.43 | | | | Maxout Network (k=2) | 2013-02-18 |
Unsharp Masking Layer: Injecting Prior Knowledge in Convolutional Networks for Image Classification | ✓ Link | 60.36 | | | | ResNet20+UnsharpMaskLayer | 2019-09-29 |
Convolutional Xformers for Vision | ✓ Link | 60.11 | | | | Convolutional Linear Transformer for Vision (CLTV) | 2022-01-25 |
FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks | ✓ Link | 60 | | | | FatNet of ResNet-18 | 2022-10-30 |
FatNet: High Resolution Kernels for Classification Using Fully Convolutional Optical Neural Networks | ✓ Link | 60 | | | | Optical Simulation of FatNet | 2022-10-30 |
Empirical Evaluation of Rectified Activations in Convolutional Network | ✓ Link | 59.8 | | | | RReLU | 2015-05-05 |
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks | ✓ Link | 57.5 | | | | Stochastic Pooling | 2013-01-16 |
How Important is Weight Symmetry in Backpropagation? | ✓ Link | 48.75 | | | | Sign-symmetry | 2015-10-17 |
Learning the Connections in Direct Feedback Alignment | | 48.03 | | | | AlexNet (DFA) | 2021-01-01 |
Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 42.64 | | | | CNN39 | 2020-10-03 |
Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 36.07 | | | | CNN36 | 2020-10-03 |
Sharpness-aware Quantization for Deep Neural Networks | ✓ Link | 35.05 | | | | CNN37 | 2021-11-24 |
Learning the Connections in Direct Feedback Alignment | | 19.49 | | | | AlexNet (FA) | 2021-01-01 |
Efficient Adaptive Ensembling for Image Classification | | | | 96.808 | | efficient adaptive ensembling | 2022-06-15 |
Label Ranker: Self-Aware Preference for Classification Label Position in Visual Masked Self-Supervised Pre-Trained Model | ✓ Link | | | 90.82% | | Label-Ranker | 2025-03-03 |
Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces | ✓ Link | | | | 91.2 | DGMMC-S | 2024-10-17 |