Progressive Multi-task Anti-Noise Learning and Distilling Frameworks for Fine-grained Vehicle Recognition | ✓ Link | 97.3% | | | TResnet-L + PMD | 2024-01-25 |
Learn from Each Other to Classify Better: Cross-layer Mutual Attention Learning for Fine-grained Visual Classification | ✓ Link | 97.1% | | | CMAL-Net | 2023-03-22 |
Interweaving Insights: High-Order Feature Interaction for Fine-Grained Visual Recognition | ✓ Link | 96.92% | | | I2-HOFI | 2024-10-20 |
ML-Decoder: Scalable and Versatile Classification Head | ✓ Link | 96.41% | | | TResNet-L + ML-Decoder | 2021-11-25 |
Domain Adaptive Transfer Learning with Specialist Models | | 96.2% | | | DAT | 2018-11-16 |
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | ✓ Link | 96.13% | | | ALIGN | 2021-02-11 |
SR-GNN: Spatial Relation-aware Graph Neural Network for Fine-Grained Image Categorization | ✓ Link | 96.1 | 9.8 | 30.9 | SR-GNN | 2022-09-05 |
Sharpness-Aware Minimization for Efficiently Improving Generalization | ✓ Link | 95.96% | | | EffNet-L2 (SAM) | 2020-10-03 |
Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation | ✓ Link | 95.72 | | | SaSPA + CAL | 2024-06-20 |
Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification | ✓ Link | 95.7% | | | CAP | 2021-01-17 |
Fine-Grained Visual Classification with Efficient End-to-end Localization | | 95.6% | | | AttNet & AffNet | 2020-05-11 |
Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization | | 95.6% | | | CSQA-Net | 2024-03-15 |
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification | ✓ Link | 95.5% | | | CAL | 2021-08-19 |
Re-rank Coarse Classification with Local Region Enhanced Features for Fine-Grained Image Recognition | | 95.5% | | | CCFR | 2021-02-19 |
Multi-Granularity Part Sampling Attention for Fine-Grained Visual Classification | ✓ Link | 95.4% | | | MPSA | 2024-08-16 |
Non-binary deep transfer learning for image classification | ✓ Link | 95.35% | | | Inceptionv4 | 2021-07-19 |
Learning Attentive Pairwise Interaction for Fine-Grained Classification | ✓ Link | 95.3% | | | API-Net | 2020-02-24 |
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification | | 95.3% | | | DCAL | 2022-05-04 |
Part-guided Relational Transformers for Fine-grained Visual Recognition | ✓ Link | 95.3% | | | PART | 2022-12-28 |
Learning Class Unique Features in Fine-Grained Visual Classification | | 95.2% | | | DenseNet161+MM+FRL | 2020-11-22 |
Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches | ✓ Link | 95.1% | | | PMG | 2020-03-08 |
Your "Flamingo" is My "Bird": Fine-Grained, or Not | ✓ Link | 95.1% | | | Multi Granularity | 2020-11-18 |
Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization | ✓ Link | 95.0% | | | TBMSL-Net | 2020-03-20 |
ELoPE: Fine-Grained Visual Classification with Efficient Localization, Pooling and Embedding | ✓ Link | 95.0% | | | ELoPE | 2019-11-17 |
ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder | ✓ Link | 95.0% | | | ViT-NeT (SwinV2-B) | 2022-07-17 |
A free lunch from ViT:Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition | | 95.0% | | | AFTrans | 2021-10-04 |
Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes | | 94.9% | | | DB | 2019-12-14 |
Attribute Mix: Semantic Data Augmentation for Fine Grained Recognition | ✓ Link | 94.9% | | | Attribute Mix+ | 2020-04-06 |
AutoAugment: Learning Augmentation Policies from Data | ✓ Link | 94.8% | | | AutoAugment | 2018-05-24 |
Fine-Grained Visual Classification with Batch Confusion Norm | | 94.8% | | | BCN | 2019-10-28 |
Weakly Supervised Fine-Grained Image Classification via Guassian Mixture Model Oriented Discriminative Learning | | 94.8% | | | DF-GMM | 2020-06-01 |
TransFG: A Transformer Architecture for Fine-grained Recognition | ✓ Link | 94.8% | | | TransFG | 2021-03-14 |
Contrastively-reinforced Attention Convolutional Neural Network for Fine-grained Image Recognition | ✓ Link | 94.8% | | | CRA-CNN | 2020-09-08 |
Selective Sparse Sampling for Fine-Grained Image Recognition | ✓ Link | 94.7% | | | S3N | 2019-10-01 |
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks | ✓ Link | 94.7% | | | EfficientNet-B7 | 2019-05-28 |
Grafit: Learning fine-grained image representations with coarse labels | | 94.7% | | | Grafit (RegNet-8GF) | 2020-11-25 |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | ✓ Link | 94.6% | | | GPipe | 2018-11-16 |
Cross-X Learning for Fine-Grained Visual Categorization | | 94.6% | | | Cross-X | 2019-09-10 |
Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization | ✓ Link | 94.6% | | | ACNet | 2019-09-25 |
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition | ✓ Link | 94.6% | | | SEB+EfficientNet-B5 | 2022-05-26 |
Progressive Co-Attention Network for Fine-grained Visual Classification | ✓ Link | 94.6% | | | PCA | 2021-01-21 |
See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification | ✓ Link | 94.5% | | | WS-DAN | 2019-01-26 |
Channel Interaction Networks for Fine-Grained Image Categorization | | 94.5% | | | CIN | 2020-03-11 |
Look-into-Object: Self-supervised Structure Modeling for Object Recognition | ✓ Link | 94.5% | | | LIO/ResNet-50 (multi-stage) | 2020-03-31 |
Grad-CAM guided channel-spatial attention module for fine-grained visual classification | | 94.41% | | | Grad-CAM | 2021-01-24 |
Fixing the train-test resolution discrepancy | ✓ Link | 94.4% | | | FixSENet-154 | 2019-06-14 |
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network | ✓ Link | 94.4% | | | Assemble-ResNet-FGVC-50 | 2020-01-17 |
The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification | ✓ Link | 94.4% | | | MC Loss (B-CNN) | 2020-02-11 |
A Simple Episodic Linear Probe Improves Visual Recognition in the Wild | ✓ Link | 94.2 | | | ELP | 2022-01-01 |
Penalizing the Hard Example But Not Too Much: A Strong Baseline for Fine-Grained Visual Classification | ✓ Link | 94.2% | | | MHEM (strong ResNet50 baseline) | 2022-11-21 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 94.2% | 8.7G | 50M | RDNet-S (224 res, IN-1K pretrained) | 2024-03-28 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 94.2% | 34.7G | 186M | RDNet-L (224 res, IN-1K pretrained) | 2024-03-28 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 94.1% | 15.4G | 87M | RDNet-B (224 res, IN-1K pretrained) | 2024-03-28 |
Graph-propagation based Correlation Learning for Weakly Supervised Fine-grained Image Classification | | 94.0% | | | GCL | 2020-02-14 |
Learning Semantically Enhanced Feature for Fine-Grained Image Classification | ✓ Link | 94.0% | | | SEF | 2020-06-24 |
Alignment Enhancement Network for Fine-grained Visual Categorization | | 94.0% | | | AENet | 2021-03-01 |
Learning to Navigate for Fine-grained Classification | ✓ Link | 93.9% | | | NTS-Net (K=4) | 2018-09-02 |
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy | ✓ Link | 93.9% | | | Bamboo (ViT-B/16) | 2022-03-15 |
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | ✓ Link | 93.9% | 5.0G | 24M | RDNet-T (224 res, IN-1K pretrained) | 2024-03-28 |
Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition | ✓ Link | 93.8% | | | DFL-CNN | 2016-11-29 |
Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition | ✓ Link | 93.8% | | | TASN | 2019-03-14 |
Three things everyone should know about Vision Transformers | ✓ Link | 93.8% | | | ViT-L (attn finetune) | 2022-03-18 |
AutoFormer: Searching Transformers for Visual Recognition | ✓ Link | 93.4% | | | AutoFormer-S | 384 | 2021-07-01 |
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization | ✓ Link | 93.3% | | | MPN-COV | 2017-12-04 |
Training data-efficient image transformers & distillation through attention | ✓ Link | 93.3% | | 86M | DeiT-B | 2020-12-23 |
Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition | ✓ Link | 93.1% | | | ResNet101-swp | 2017-04-04 |
Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition | ✓ Link | 93.0% | | | MAMC | 2018-06-14 |
Neural Architecture Transfer | ✓ Link | 92.9% | 369M | 3.7M | NAT-M4 | 2020-05-12 |
Pairwise Confusion for Fine-Grained Visual Classification | ✓ Link | 92.86% | | | PC-DenseNet-161 | 2017-05-22 |
Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition | ✓ Link | 92.8 | | | MACNN | 2017-10-01 |
ResNet strikes back: An improved training procedure in timm | ✓ Link | 92.7% | 4.1B | 24M | ResNet50 (A1) | 2021-10-01 |
Neural Architecture Transfer | ✓ Link | 92.6% | 289M | 3.5M | NAT-M3 | 2020-05-12 |
Classification-Specific Parts for Improving Fine-Grained Visual Categorization | ✓ Link | 92.5% | | | CS-Parts | 2019-09-16 |
Classification-Specific Parts for Improving Fine-Grained Visual Categorization | ✓ Link | 92.5% | | | CS-Part | 2019-09-16 |
Neural Architecture Transfer | ✓ Link | 92.2% | 222M | 2.7M | NAT-M2 | 2020-05-12 |
PCNN: Probable-Class Nearest-Neighbor Explanations Improve Fine-Grained Image Classification Accuracy for AIs and Humans | ✓ Link | 91.06% | | | ResNet-50 | 2023-08-25 |
Neural Architecture Transfer | ✓ Link | 90.9% | 165M | 2.4M | NAT-M1 | 2020-05-12 |
Exploring Localization for Self-supervised Fine-grained Contrastive Learning | ✓ Link | 89.76% | | | BYOL+CVSA (ResNet-50) | 2021-06-30 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 89.5% | | | ResMLP-24 | 2021-05-07 |
Multiscale patch-based feature graphs for image classification | ✓ Link | 86.79 | | | MPFG + CLIP | 2023-08-08 |
ResMLP: Feedforward networks for image classification with data-efficient training | ✓ Link | 84.6% | | | ResMLP-12 | 2021-05-07 |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | ✓ Link | 68.03% | | | SEER (RegNet10B) | 2022-02-16 |
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics | ✓ Link | 65.68 | | | MANO-tiny | 2025-07-03 |