An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ✓ Link | 99.5 | | | | | | | ViT-H/14 | 2020-10-22 |
DINOv2: Learning Robust Visual Features without Supervision | ✓ Link | 99.5 | | | | | | | DINOv2 (ViT-g/14, frozen model, linear eval) | 2023-04-14 |
An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems | ✓ Link | 99.49 | | | | | | | µ2Net (ViT-L/16) | 2022-05-25 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | ✓ Link | 99.42 | | | | | | | ViT-L/16 | 2020-10-22 |
Going deeper with Image Transformers | ✓ Link | 99.4 | | | | | | | CaiT-M-36 U 224 | 2021-03-31 |
CvT: Introducing Convolutions to Vision Transformers | ✓ Link | 99.39 | | | | | | | CvT-W24 | 2021-03-29 |
Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 99.37 | | | | | | | BiT-L (ResNet) | 2019-12-24 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 99.31 | | | | | | | RDNet-L (224 res, IN-1K pretrained) | 2024-03-28 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 99.31 | | | | | | | RDNet-B (224 res, IN-1K pretrained) | 2024-03-28 |
Three things everyone should know about Vision Transformers | ✓ Link | 99.3 | | | | | | | ViT-B (attn fine-tune) | 2022-03-18 |
An Algorithm for Routing Vectors in Sequences | ✓ Link | 99.2 | | | | | | | Heinsen Routing + BEiT-large 16 224 | 2022-11-20 |
Perturbated Gradients Updating within Unit Space for Deep Learning | ✓ Link | 99.13 | | | | | | | ViT-B/16 (PUGD) | 2021-10-01 |
Astroformer: More Data Might not be all you need for Classification | ✓ Link | 99.12 | 99.12 | | | | | | Astroformer | 2023-04-03 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 99.1 | | | | | | | DeiT-B | 2020-12-23 |
Transformer in Transformer | ✓ Link | 99.1 | | | | | | | TNT-B | 2021-02-27 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 99.1 | | | | | | | CeiT-S (384 finetune resolution) | 2021-03-22 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 99.1 | | | | | | | EfficientNetV2-L | 2021-04-01 |
AutoFormer: Searching Transformers for Visual Recognition | ✓ Link | 99.1 | | | | | | | AutoFormer-S | 384 | 2021-07-01 |
Reduction of Class Activation Uncertainty with Background Information | ✓ Link | 99.05 | | | | | | | VIT-L/16 (Spinal FC, Background) | 2023-05-05 |
Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search | ✓ Link | 99.03 | | | | | | | LaNet | 2019-01-01 |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | ✓ Link | 99 | | | | | | | GPIPE + transfer learning | 2018-11-16 |
TResNet: High Performance GPU-Dedicated Architecture | ✓ Link | 99 | | | | | | | TResNet-XL | 2020-03-30 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 99 | | | | | | | CeiT-S | 2021-03-22 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 99.0 | | | | | | | EfficientNetV2-M | 2021-04-01 |
Global Filter Networks for Image Classification | ✓ Link | 99.0 | | | | | | | GFNet-H-B | 2021-07-01 |
Big Transfer (BiT): General Visual Representation Learning | ✓ Link | 98.91 | | | | | | | BiT-M (ResNet) | 2019-12-24 |
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks | ✓ Link | 98.9 | | | | | | | EfficientNet-B7 | 2019-05-28 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 98.88 | | | | | | | RDNet-T (224 res, IN-1K pretrained) | 2024-03-28 |
Adaptive Split-Fusion Transformer | ✓ Link | 98.8% | | | | | | | ASF-former-B | 2022-04-26 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 98.71 | | | | | | | PyramidNet-272, S=4 | 2020-11-30 |
EfficientNetV2: Smaller Models and Faster Training | ✓ Link | 98.7 | | | | | | | EfficientNetV2-S | 2021-04-01 |
Adaptive Split-Fusion Transformer | ✓ Link | 98.7 | | | | | | | ASF-former-S | 2022-04-26 |
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks | ✓ Link | 98.68 | | | | | | | PyramidNet-272 (ASAM) | 2021-02-23 |
FMix: Enhancing Mixed Sample Data Augmentation | ✓ Link | 98.64 | | | | | | | PyramidNet + ShakeDrop + Fast AA + FMix | 2020-02-27 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 98.6 | | | | | | | ViT-B/16- SAM | 2021-06-03 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 98.6 | | | | | | | ConvMLP-M | 2021-09-09 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 98.6 | | | | | | | ConvMLP-L | 2021-09-09 |
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition | ✓ Link | 98.53 | | | | | | | DVT (T2T-ViT-24) | 2021-05-31 |
Rethinking Recurrent Neural Networks and Other Improvements for Image Classification | ✓ Link | 98.52 | | | | | | | E2E-3M | 2020-07-30 |
Incorporating Convolution Designs into Visual Transformers | ✓ Link | 98.5 | | | | | | | CeiT-T | 2021-03-22 |
Neural Architecture Transfer | ✓ Link | 98.4 | 98.4 | | 6.9M | | | | NAT-M4 | 2020-05-12 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 98.38 | | | | | | | WRN-40-10, S=4 | 2020-11-30 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 98.32 | | | | | | | WRN-28-10, S=4 | 2020-11-30 |
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training | ✓ Link | 98.31 | | | | | | | Shake-Shake 26 2x96d, S=4 | 2020-11-30 |
PSO-Convolutional Neural Networks with Heterogeneous Learning Rate | ✓ Link | 98.31 | | | | | | | Dynamics 2 | 2022-05-20 |
Fast AutoAugment | ✓ Link | 98.3 | | | | | | | PyramidNet+ShakeDrop (Fast AA) | 2019-05-01 |
ResNet strikes back: An improved training procedure in timm | ✓ Link | 98.3 | | | | | | | ResNet50 (A1) | 2021-10-01 |
Noisy Differentiable Architecture Search | ✓ Link | 98.28 | | | | | | | NoisyDARTS-A-t | 2020-05-07 |
Neural Architecture Transfer | ✓ Link | 98.2 | 98.2 | | 6.2M | | | | NAT-M3 | 2020-05-12 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 98.2 | | | | | | | LeViT-192 | 2021-04-02 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 98.2 | | | | | | | ResNet-152-SAM | 2021-06-03 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 98.2 | | | | | | | ViT-S/16- SAM | 2021-06-03 |
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy | ✓ Link | 98.2 | | | | | | | Bamboo (ViT-B/16) | 2022-03-15 |
Learning Hyperparameters via a Data-Emphasized Variational Objective | ✓ Link | 98.2 | | | | | | | DE ELBo (ViT-B/16) | 2025-02-03 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 98.1 | | | | | | | LeViT-256 | 2021-04-02 |
Regularizing Neural Networks via Adversarial Model Perturbation | ✓ Link | 98.02 | | | | | | | PyramidNet + AA (AMP) | 2020-10-10 |
EnAET: A Self-Trained framework for Semi-Supervised and Supervised Learning with Ensemble Transformations | ✓ Link | 98.01 | | | | | | | EnAET | 2019-11-21 |
MUXConv: Information Multiplexing in Convolutional Neural Networks | ✓ Link | 98.0 | 98.0 | | | | | | MUXNet-m | 2020-03-31 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 98 | | | | | | | LeViT-384 | 2021-04-02 |
Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 98 | | | | | | | CCT-7/3x1* | 2021-04-12 |
ConvMLP: Hierarchical Convolutional MLPs for Vision | ✓ Link | 98 | | | | | | | ConvMLP-S | 2021-09-09 |
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware | ✓ Link | 97.92 | | | | | | | Proxyless-G + c/o | 2018-12-02 |
Neural Architecture Transfer | ✓ Link | 97.9 | 97.9 | | 4.6M | | | | NAT-M2 | 2020-05-12 |
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks | ✓ Link | 97.9 | | | | | | | WRN-28-10+AutoDropout+RandAugment | 2021-01-05 |
Squeeze-and-Excitation Networks | ✓ Link | 97.88 | | | | | | | SENet + ShakeShake + Cutout | 2017-09-05 |
Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 97.86 | | | | | | | HCGNet-A3 | 2019-08-26 |
Automatic Data Augmentation via Invariance-Constrained Learning | ✓ Link | 97.85 | | | | | | | Wide-ResNet-28-10 | 2022-09-29 |
AutoMix: Unveiling the Power of Mixup for Stronger Classifiers | ✓ Link | 97.84 | | | | | | | ResNeXt-50 (AutoMix) | 2021-03-24 |
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images | ✓ Link | 97.82 | | | | | | | ResNet-152x4-AGC (ImageNet-21K) | 2021-05-31 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 97.8 | | | | | | | Mixer-B/16- SAM | 2021-06-03 |
TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers | ✓ Link | 97.78 | | | | | | | CCT-7/3x1+VTM | 2022-10-14 |
MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks | ✓ Link | 97.73 | | | | | | | WRN-28-10 | 2021-03-10 |
Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 97.71 | | | | | | | HCGNet-A2 | 2019-08-26 |
Fixup Initialization: Residual Learning Without Normalization | ✓ Link | 97.7 | | | | | | | WRN + fixup init + mixup + cutout | 2019-01-27 |
Noisy Differentiable Architecture Search | ✓ Link | 97.61 | | | | | | | NoisyDARTS-a | 2020-05-07 |
TransBoost: Improving the Best ImageNet Performance using Deep Transduction | ✓ Link | 97.61 | | | | | | | TransBoost-ResNet50 | 2022-05-26 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 97.6 | | | | | | | LeViT-128 | 2021-04-02 |
batchboost: regularization for stabilizing training with resistance to underfitting & overfitting | ✓ Link | 97.54 | | | | | | | DenseNet-BC-190 + batchboost | 2020-01-21 |
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference | ✓ Link | 97.5 | | | | | | | LeViT-128S | 2021-04-02 |
Learning Implicitly Recurrent CNNs Through Parameter Sharing | ✓ Link | 97.47 | | | | | | | Shared WRN | 2019-02-26 |
Manifold Mixup: Better Representations by Interpolating Hidden States | ✓ Link | 97.45 | | | | | | | Manifold Mixup WRN 28-10 | 2018-06-13 |
Neural networks with late-phase weights | ✓ Link | 97.45 | | | | | | | WRN 28-14 | 2020-07-25 |
SparseSwin: Swin Transformer with Sparse Transformer Block | ✓ Link | 97.43 | | | | | | | SparseSwin | 2023-09-11 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 97.42 | | | | | | | WRN-28-10 with reSGHMC | 2020-08-12 |
Neural Architecture Transfer | ✓ Link | 97.4 | 97.4 | | 4.3M | | | | NAT-M1 | 2020-05-12 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 97.4 | | | | | | | ResNet-50-SAM | 2021-06-03 |
mixup: Beyond Empirical Risk Minimization | ✓ Link | 97.3 | | | | | | | DenseNet-BC-190 + Mixup | 2017-10-25 |
Revisiting a kNN-based Image Classification System with High-capacity Storage | | 97.3 | | | | | | | kNN-CLIP | 2022-04-03 |
WaveMix: A Resource-efficient Neural Network for Image Analysis | ✓ Link | 97.29 | | | | | | | WaveMixLite-144/7 | 2022-05-28 |
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding | ✓ Link | 97.2 | | | | | | | Transformer local-attention (NesT-B) | 2021-05-26 |
Averaging Weights Leads to Wider Optima and Better Generalization | ✓ Link | 97.12 | | | | | | | ShakeShake-2x64d + SWA | 2018-03-14 |
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features | ✓ Link | 97.12 | | | | | | | PyramidNet-200 + CutMix | 2019-05-13 |
Automatic Data Augmentation via Invariance-Constrained Learning | ✓ Link | 97.05 | | | | | | | Wide-ResNet-40-2 | 2022-09-29 |
Oriented Response Networks | ✓ Link | 97.02 | | | | | | | ORN | 2017-01-07 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 96.87 | | | | | | | WRN-16-8 with reSGHMC | 2020-08-12 |
XnODR and XnIDR: Two Accurate and Fast Fully Connected Layers For Convolutional Neural Networks | ✓ Link | 96.87 | | | | | | | ResNet_XnIDR | 2021-11-21 |
Gated Convolutional Networks with Hybrid Connectivity for Image Classification | ✓ Link | 96.85 | | | | | | | HCGNet-A1 | 2019-08-26 |
Neural networks with late-phase weights | ✓ Link | 96.81 | | | | | | | WRN 28-10 | 2020-07-25 |
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks | ✓ Link | 96.8 | | | | | | | AutoDropout | 2021-01-05 |
Averaging Weights Leads to Wider Optima and Better Generalization | ✓ Link | 96.79 | | | | | | | WRN-28-10 + SWA | 2018-03-14 |
Patches Are All You Need? | ✓ Link | 96.74 | | | | | | | ConvMixer-256/16 | 2022-01-24 |
EXACT: How to Train Your Accuracy | ✓ Link | 96.73 | | | | | | | EXACT (WRN-28-10) | 2022-05-19 |
Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems | ✓ Link | 96.71 | | | | | | | Wide ResNet+cutout | 2019-07-16 |
Deep Pyramidal Residual Networks | ✓ Link | 96.69 | | | | | | | Deep pyramidal residual network | 2016-10-10 |
Deep Competitive Pathway Networks | ✓ Link | 96.62 | | | | | | | CoPaNet-R-164 | 2017-09-29 |
Densely Connected Convolutional Networks | ✓ Link | 96.54 | | | | | | | DenseNet (DenseNet-BC-190) | 2016-08-25 |
Selective Kernel Networks | ✓ Link | 96.53 | | | | | | | SKNet-29 (ResNeXt-29, 16×32d) | 2019-03-15 |
Fractional Max-Pooling | ✓ Link | 96.5 | | | | | | | Fractional MP | 2014-12-18 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 96.5 | | | | | | | PDO-eConv (p8, 4.6M) | 2020-07-20 |
UPANets: Learning from the Universal Pixel Attention Networks | ✓ Link | 96.47 | | | | | | | UPANets | 2021-03-15 |
Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks | ✓ Link | 96.46 | | | | | | | GAC-SNN | 2023-08-12 |
Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images | | 96.41 | | | | | | | ViT (lightweight, MAE pretrained) | 2024-02-06 |
Neural Architecture Search with Reinforcement Learning | ✓ Link | 96.4 | | | | | | | NAS-RL | 2016-11-05 |
Training Neural Networks with Local Error Signals | ✓ Link | 96.4 | | | | | | | VGG11B(2x) + LocalLearning + CO | 2019-01-20 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 96.378 | | | | | | | ABNet-2G-R3-Combined | 2024-11-28 |
Learning Identity Mappings with Residual Gates | | 96.35 | | | | | | | Residual Gates + WRN | 2016-11-04 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 96.32 | | | | | | | PDO-eConv (p8, 2.62M) | 2020-07-20 |
Towards Principled Design of Deep Convolutional Networks: Introducing SimpNet | ✓ Link | 96.29 | | | | | | | SimpleNetv2 | 2018-02-17 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 96.12 | | | | | | | ResNet56 with reSGHMC | 2020-08-12 |
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations | ✓ Link | 96.1 | | | | | | | Mixer-S/16- SAM | 2021-06-03 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 96.088 | | | | | | | ABNet-2G-R3 | 2024-11-28 |
Regularizing Neural Networks via Adversarial Model Perturbation | ✓ Link | 96.03 | | | | | | | PreActResNet18 (AMP) | 2020-10-10 |
Patches Are All You Need? | ✓ Link | 96.03 | | | | | | | ConvMixer-256/8 | 2022-01-24 |
Preventing Manifold Intrusion with Locality: Local Mixup | ✓ Link | 95.97 | | | | | | | Local Mixup Resnet18 | 2022-01-12 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 95.900 | | | | | | | ABNet-2G-R2 | 2024-11-28 |
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images | ✓ Link | 95.78 | | | | | | | ResNet-50x1-ACG (ImageNet-21K) | 2021-05-31 |
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective | ✓ Link | 95.66 | | | | | | | ResNet18 (FSGDM) | 2024-11-29 |
Striving for Simplicity: The All Convolutional Net | ✓ Link | 95.6 | | | | | | | ACN | 2014-12-21 |
Large-Scale Evolution of Image Classifiers | ✓ Link | 95.6 | | | | | | | Evolution ensemble | 2017-03-03 |
Benchopt: Reproducible, efficient and collaborative optimization benchmarks | ✓ Link | 95.55 | | | | | | | ResNet-18 | 2022-06-27 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 95.536 | | | | | | | ABNet-2G-R1 | 2024-11-28 |
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures | ✓ Link | 95.51 | | | | | | | SimpleNetv1 | 2016-08-22 |
MobileNetV2: Inverted Residuals and Linear Bottlenecks | ✓ Link | 95.50 | | | | | | | Mobile Net_Sam | 2018-01-13 |
IM-Loss: Information Maximization Loss for Spiking Neural Networks | | 95.49 | | | | | | | IM-Loss (ResNet-19) | 2022-10-31 |
Identity Mappings in Deep Residual Networks | ✓ Link | 95.4 | | | | | | | ResNet-1001 | 2016-03-16 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 95.35 | | | | | | | ResNet32 with reSGHMC | 2020-08-12 |
Learning Class Unique Features in Fine-Grained Visual Classification | | 95.33 | | | | | | | ResNet-18+MM+FRL | 2020-11-22 |
[]() | | 95.32 | | | | | | | PSN (Modified PLIF Net) | |
Escaping the Big Data Paradigm with Compact Transformers | ✓ Link | 95.29 | | | | | | | CCT-6/3x1 | 2021-04-12 |
Momentum Residual Neural Networks | ✓ Link | 95.18 | | | | | | | MomentumNet | 2021-02-15 |
Context-Aware Compilation of DNN Training Pipelines across Edge and Cloud | ✓ Link | 95.16 | | | | | | | Context-Aware Pipeline | 2021-12-30 |
SRM : A Style-based Recalibration Module for Convolutional Neural Networks | ✓ Link | 95.05 | 95.05 | | | | | | SRM-ResNet-56 | 2019-03-26 |
MixMatch: A Holistic Approach to Semi-Supervised Learning | ✓ Link | 95.05 | | | | | | | MixMatch | 2019-05-06 |
Sparse Networks from Scratch: Faster Training without Losing Performance | ✓ Link | 95.04 | | | | | | | WRN-22-8 (Sparse Momentum) | 2019-07-10 |
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification | ✓ Link | 95.02 | | | | | | | LP-BNN (ours) + cutout | 2020-12-04 |
An Enhanced Scheme for Reducing the Complexity of Pointwise Convolutions in CNNs for Image Classification Based on Interleaved Grouped Filters without Divisibility Constraints | ✓ Link | 94.95 | | | | | | | kEffNet-B0 V2 32ch + H Flip | 2022-09-08 |
Deep Polynomial Neural Networks | ✓ Link | 94.9 | | | | | | | Prodpoly | 2020-06-20 |
CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters | ✓ Link | 94.79 | | | | | | | ResNet-9 | 2022-03-29 |
Deep Networks with Stochastic Depth | ✓ Link | 94.77 | | | | | | | Stochastic Depth | 2016-03-30 |
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training | ✓ Link | 94.71 | | | | | | | VGG-19 with GradInit | 2021-02-16 |
Non-convex Learning via Replica Exchange Stochastic Gradient MCMC | ✓ Link | 94.62 | | | | | | | ResNet20 with reSGHMC | 2020-08-12 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 94.62 | | | | | | | PDO-eConv (p6m,0.37M) | 2020-07-20 |
Large-Scale Evolution of Image Classifiers | ✓ Link | 94.6 | | | | | | | Evolution | 2017-03-03 |
Efficient Architecture Search by Network Transformation | ✓ Link | 94.6 | | | | | | | RL+NT | 2017-07-16 |
Convolutional Xformers for Vision | ✓ Link | 94.46 | | | | | | | Convolutional Performer for Vision (CPV) | 2022-01-25 |
How to Use Dropout Correctly on Residual Networks with Batch Normalization | ✓ Link | 94.4367 | | | | | | | PreResNet-110 | 2023-02-13 |
Deep Residual Networks with Exponential Linear Unit | ✓ Link | 94.4 | | | | | | | ResNet+ELU | 2016-04-14 |
Deep Complex Networks | ✓ Link | 94.4 | | | | | | | Deep Complex | 2017-05-27 |
PDO-eConvs: Partial Differential Operator Based Equivariant Convolutions | ✓ Link | 94.35 | | | | | | | PDO-eConv (p6,0.36M) | 2020-07-20 |
Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods | ✓ Link | 94.29 | | | | | | | Stochastic Optimization of Plain Convolutional Neural Networks with Simple methods | 2020-01-24 |
All you need is a good init | ✓ Link | 94.2 | | | | | | | Fitnet4-LSUV | 2015-11-19 |
Learning local discrete features in explainable-by-design convolutional neural networks | ✓ Link | 94.15 | | | 0.89 M | | | | R-ExplaiNet-26 | 2024-10-31 |
ANDHRA Bandersnatch: Training Neural Networks to Predict Parallel Realities | ✓ Link | 94.118 | | | | | | | ABNet-2G-R0 | 2024-11-28 |
Mish: A Self Regularized Non-Monotonic Activation Function | ✓ Link | 94.05 | | | | | | | ResNet 9 + Mish | 2019-08-23 |
Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree | ✓ Link | 94.0 | | | | | | | Tree+Max-Avg pooling | 2015-09-30 |
Beta-Rank: A Robust Convolutional Filter Pruning Method For Imbalanced Medical Image Analysis | ✓ Link | 93.97 | | | | | | | Beta-Rank | 2023-04-15 |
Stochastic Subsampling With Average Pooling | | 93.861 | | | | | | | ResNet-110 (SAP) | 2024-09-25 |
On the Relationship between Self-Attention and Convolutional Layers | ✓ Link | 93.8 | | | | | | | SA quadratic embedding | 2019-11-08 |
Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks | ✓ Link | 93.75 | | | | | | | kEffNet-B0 32ch | 2022-06-30 |
Online Training Through Time for Spiking Neural Networks | ✓ Link | 93.73 | | | | | | | OTTT | 2022-10-09 |
Spatially-sparse convolutional neural networks | ✓ Link | 93.7 | | | | | | | SSCNN | 2014-09-22 |
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations | ✓ Link | 93.7 | | | | | | | NNCLR | 2021-04-29 |
Scalable Bayesian Optimization Using Deep Neural Networks | ✓ Link | 93.6 | | | | | | | Tuned CNN | 2015-02-19 |
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) | ✓ Link | 93.5 | | | | | | | Exponential Linear Units | 2015-11-23 |
Batch-normalized Maxout Network in Network | ✓ Link | 93.3 | | | | | | | BNM NiN | 2015-11-09 |
Universum Prescription: Regularization using Unlabeled Data | | 93.3 | | | | | | | Universum Prescription | 2015-11-11 |
Competitive Multi-scale Convolution | | 93.1 | | | | | | | CMsC | 2015-11-18 |
Distilled Gradual Pruning with Pruned Fine-tuning | ✓ Link | 92.90 | | | | | | | DGPPF-ResNet18 | 2024-02-15 |
Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks | ✓ Link | 92.74 | | | | | | | kMobileNet V3 Large 16ch | 2022-06-30 |
Learning Activation Functions to Improve Deep Neural Networks | ✓ Link | 92.5 | | | | | | | NiN+APL | 2014-12-21 |
Training Very Deep Networks | ✓ Link | 92.4 | | | | | | | VDN | 2015-07-22 |
A Bregman Learning Framework for Sparse Neural Networks | ✓ Link | 92.3 | | | | | | | ResNet | 2021-05-10 |
Stacked What-Where Auto-encoders | ✓ Link | 92.2 | | | | | | | SWWAE | 2015-06-08 |
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes | ✓ Link | 92.2 | | | | | | | FlexTCN-7 | 2021-10-15 |
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization | ✓ Link | 92.08 | | | | | | | ReActNet-18 | 2021-04-16 |
Mish: A Self Regularized Non-Monotonic Activation Function | ✓ Link | 92.02 | | | | | | | ResNet v2-20 (Mish activation) | 2019-08-23 |
Context-aware deep model compression for edge cloud computing | | 92.01 | | | | | | | Context-Aware DNN tree | 2020-11-29 |
Deeply-Supervised Nets | ✓ Link | 91.8 | | | | | | | DSN | 2014-09-18 |
BinaryConnect: Training Deep Neural Networks with binary weights during propagations | ✓ Link | 91.7 | | | | | | | BinaryConnect | 2015-11-02 |
Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities | ✓ Link | 91.7 | | | | | | | CLS-GAN | 2017-01-23 |
On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units | | 91.5 | | | | | | | MIM | 2015-08-03 |
Spectral Representations for Convolutional Neural Networks | | 91.4 | | | | | | | Spectral Representations for Convolutional Neural Networks | 2015-06-11 |
DLME: Deep Local-flatness Manifold Embedding | ✓ Link | 91.3 | | | | | | | DLME (ResNet-18, linear) | 2022-07-07 |
RMDL: Random Multimodel Deep Learning for Classification | ✓ Link | 91.21 | | | | | | | RMDL (30 RDLs) | 2018-05-03 |
Network In Network | ✓ Link | 91.2 | | | | | | | Network in Network | 2013-12-16 |
Trainable Activations for Image Classification | ✓ Link | 91.1 | | | | | | | ResNet-26 (Trainable Activations) | 2023-01-26 |
Trainable Activations for Image Classification | ✓ Link | 90.9 | | | | | | | ResNet-32 (Trainable Activations) | 2023-01-26 |
Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks | ✓ Link | 90.83 | | | | | | | kDenseNet-BC L100 12ch | 2022-06-30 |
Deep Networks with Internal Selective Attention through Feedback Connections | | 90.8 | | | | | | | Deep Networks with Internal Selective Attention through Feedback Connections | 2014-07-11 |
Maxout Networks | ✓ Link | 90.65 | | | | | | | Maxout Network (k=2) | 2013-02-18 |
Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation | | 90.65 | | | | | | | ResNet-18 | 2019-11-13 |
Improving Deep Neural Networks with Probabilistic Maxout Units | | 90.6 | | | | | | | DNN+Probabilistic Maxout | 2013-12-20 |
Practical Bayesian Optimization of Machine Learning Algorithms | ✓ Link | 90.5 | | | | | | | GP EI | 2012-06-13 |
Trainable Activations for Image Classification | ✓ Link | 90.5 | | | | | | | ResNet-44 (Trainable Activations) | 2023-01-26 |
Trainable Activations for Image Classification | ✓ Link | 90.4 | | | | | | | ResNet-20 (Trainable Activations) | 2023-01-26 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 90 | | | | | | | SEER (RegNet10B) | 2022-02-16 |
Grouped Pointwise Convolutions Reduce Parameters in Convolutional Neural Networks | ✓ Link | 89.81 | | | | | | | kMobileNet 16ch | 2022-06-30 |
APAC: Augmented PAttern Classification with Neural Networks | | 89.7 | | | | | | | APAC | 2015-05-13 |
Dynamic Routing Between Capsules | ✓ Link | 89.4 | | | | | | | ensemble of 7 models | 2017-10-26 |
Deep Convolutional Neural Networks as Generic Feature Extractors | | 89.1 | | | | | | | DCNN+GFE | 2017-10-06 |
ImageNet Classification with Deep Convolutional Neural Networks | ✓ Link | 89 | | | | | | | DCNN | 2012-12-01 |
Trainable Activations for Image Classification | ✓ Link | 89.0 | | | | | | | ResNet-14 (Trainable Activations) | 2023-01-26 |
Multi-column Deep Neural Networks for Image Classification | ✓ Link | 88.8 | | | | | | | MCDNN | 2012-02-13 |
Empirical Evaluation of Rectified Activations in Convolutional Network | ✓ Link | 88.8 | | | | | | | RReLU | 2015-05-05 |
Trainable Activations for Image Classification | ✓ Link | 88.8 | | | | | | | ResNet-56 (Trainable Activations) | 2023-01-26 |
Fast-DENSER++: Evolving Fully-Trained Deep Artificial Neural Networks | | 88.73 | | | | | | | F-DENSER++ | 2019-05-08 |
Your Diffusion Model is Secretly a Zero-Shot Classifier | ✓ Link | 88.5 | | | | | | | Diffusion Classifier (zero-shot) | 2023-03-28 |
ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks | ✓ Link | 87.7 | | | | | | | ReNet | 2015-05-03 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 87.65 | 87.65 | | 0.95M | | | | OnDev-LCT-8/3 | 2024-01-22 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 87.03 | 87.03 | | 0.55M | | | | OnDev-LCT-4/3 | 2024-01-22 |
Efficient Convolutional Neural Networks on Raspberry Pi for Image Classification | ✓ Link | 87.03 | | | | | | | TripleNet-B | 2022-04-02 |
An Analysis of Unsupervised Pre-training in Light of Recent Advances | ✓ Link | 86.7 | | | | | | | An Analysis of Unsupervised Pre-training in Light of Recent Advances | 2014-12-20 |
ThreshNet: An Efficient DenseNet Using Threshold Mechanism to Reduce Connections | ✓ Link | 86.69 | | | | | | | ThreshNet95 | 2022-01-09 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 86.64 | 86.64 | | 0.91M | | | | OnDev-LCT-8/1 | 2024-01-22 |
Connection Reduction of DenseNet for Image Recognition | ✓ Link | 86.64 | | | | | | | ShortNet1-53 | 2022-08-02 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 86.61 | 86.61 | | 0.51M | | | | OnDev-LCT-4/1 | 2024-01-22 |
Learning in Wilson-Cowan model for metapopulation | ✓ Link | 86.59 | | | | | | | CNN+ Wilson-Cowan model RNN | 2024-06-24 |
Trainable Activations for Image Classification | ✓ Link | 86.5 | | | | | | | ResNet-8 (Trainable Activations) | 2023-01-26 |
New Pruning Method Based on DenseNet Network for Image Classification | | 86.34 | | | | | | | ThresholdNet | 2021-08-28 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 86.27 | 86.27 | | 0.31M | | | | OnDev-LCT-2/1 | 2024-01-22 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 86.04 | 86.04 | | 0.35M | | | | OnDev-LCT-2/3 | 2024-01-22 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 85.73 | 85.73 | | 0.25M | | | | OnDev-LCT-1/3 | 2024-01-22 |
ResNet strikes back: An improved training procedure in timm | ✓ Link | 85.28 | | | | | | | cvpr_class | 2021-10-01 |
WaveMix: Multi-Resolution Token Mixing for Images | ✓ Link | 85.21 | | | | | | | WaveMix | 2021-09-29 |
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks | ✓ Link | 84.9 | | | | | | | Stochastic Pooling | 2013-01-16 |
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning | | 84.55 | 84.55 | | 0.21M | | | | OnDev-LCT-1/1 | 2024-01-22 |
Improving neural networks by preventing co-adaptation of feature detectors | ✓ Link | 84.4 | | | | | | | Improving neural networks by preventing co-adaptation of feature detectors | 2012-07-03 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 83.36 | | | | | | | CCN | 2021-07-05 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 83.26 | | | | | | | CvN | 2021-07-05 |
Unsupervised Learning using Pretrained CNN and Associative Memory Bank | | 83.1 | | | | | | | UL-Hopfield (ULH) | 2018-05-02 |
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks | ✓ Link | 82.8 | | | | | | | DCGAN | 2015-11-19 |
An Optimized Toolbox for Advanced Image Processing with Tsetlin Machine Composites | ✓ Link | 82.8 | | | | | | | TM Composites Toolbox | 2024-06-02 |
Convolutional Kernel Networks | | 82.2 | | | | | | | CKN | 2014-06-12 |
Evaluating the Performance of TAAF for image classification models | ✓ Link | 82.06 | | | 545100 | 82.06 | | 0.5551 | The Analog Activation Function | 2025-02-13 |
Discriminative Unsupervised Feature Learning with Convolutional Neural Networks | ✓ Link | 82 | | | | | | | Discriminative Unsupervised Feature Learning with Convolutional Neural Networks | 2014-12-01 |
How Important is Weight Symmetry in Backpropagation? | ✓ Link | 80.98 | | | | | | | Sign-symmetry | 2015-10-17 |
Personalized Federated Learning with Hidden Information on Personalized Prior | | 80.63 | | | | | | | pFedBreD_ns_mg | 2022-11-19 |
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks | ✓ Link | 80.6 | | | | | | | 1 Layer K-means | 2015-11-19 |
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions | | 80.45 | | | | | | | APVT | 2022-03-02 |
Learning with Recursive Perceptual Representations | | 79.7 | | | | | | | Learning with Recursive Perceptual Representations | 2012-12-01 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 79.50 | | | | | | | LeViP | 2021-07-05 |
[]() | | 78.9 | | | | | | | Convolutional Deep Belief Network | |
PCANet: A Simple Deep Learning Baseline for Image Classification? | ✓ Link | 78.7 | | | | | | | PCANet | 2014-04-14 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 76.9 | | | | | | | Hybrid ViT+RoPE | 2021-07-05 |
Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network | | 75.9 | | | | | | | FLSCNN | 2015-03-16 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 75.26 | | | | | | | Hybrid Vision Nystromformer (ViN) | 2021-07-05 |
Drop Clause: Enhancing Performance, Interpretability and Robustness of the Tsetlin Machine | ✓ Link | 75.1 | | | | | | | CTM Drop Clause | 2021-05-30 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 74 | | | | | | | Hybrid PiN | 2021-07-05 |
SmoothNets: Optimizing CNN architecture design for differentially private deep learning | ✓ Link | 73.5 | | | | | | | SmoothNetV1 | 2022-05-09 |
Sneaky Spikes: Uncovering Stealthy Backdoor Attacks in Spiking Neural Networks with Neuromorphic Data | ✓ Link | 68.3 | | | | | | | SNN | 2023-02-13 |
Vision Xformers: Efficient Attention for Image Classification | ✓ Link | 65.06 | | | | | | | Vision Nystromformer (ViN) | 2021-07-05 |
Augmented Neural ODEs | ✓ Link | 60.6 | | | | | | | ANODE | 2019-04-02 |
Efficient Adaptive Ensembling for Image Classification | | | | 99.612 | | | | | efficient adaptive ensembling | 2022-06-15 |
Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces | ✓ Link | | | | | 98.8 | | | DGMMC-S | 2024-10-17 |
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers | ✓ Link | | | | | | 95.74 | | SAG-ViT | 2024-11-14 |