VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer | | 71.1 | | | VPNeXt | 2025-02-23 |
Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers | ✓ Link | 71.0 | | | PlainSeg (EVA-02-L) | 2023-10-19 |
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions | ✓ Link | 70.3 | | | InternImage-H | 2022-11-10 |
Representation Separation for Semantic Segmentation with Vision Transformers | | 68.9 | | | RSSeg-ViT-L (BEiT pretrain) | 2022-12-28 |
Vision Transformer Adapter for Dense Predictions | ✓ Link | 68.2 | | | ViT-Adapter-L (Mask2Former, BEiT pretrain) | 2022-05-17 |
Vision Transformer Adapter for Dense Predictions | ✓ Link | 67.5 | | | ViT-Adapter-L (UperNet, BEiT pretrain) | 2022-05-17 |
Representation Separation for Semantic Segmentation with Vision Transformers | | 67.5 | | | RSSeg-ViT-L | 2022-12-28 |
SegViT: Semantic Segmentation with Plain Vision Transformers | ✓ Link | 65.3 | | | SegViT (ours) | 2022-10-12 |
CAR: Class-aware Regularizations for Semantic Segmentation | ✓ Link | 64.1 | | | CAA + CAR (ConvNeXt-Large + JPU) | 2022-03-14 |
Efficient Self-Ensemble for Semantic Segmentation | ✓ Link | 64.0 | | | SenFormer (Swin-L) | 2021-11-26 |
Sequential Ensembling for Semantic Segmentation | | 62.1 | | | Sequential Ensemble (Segformer + HRNet) | 2022-10-08 |
Channelized Axial Attention for Semantic Segmentation -- Considering Channel Relation within Spatial Attention for Semantic Segmentation | ✓ Link | 60.5 | | | CAA + Simple decoder (Efficientnet-B7) | 2021-01-19 |
Vision Transformers for Dense Prediction | ✓ Link | 60.46 | | | DPT-Hybrid | 2021-03-24 |
Channelized Axial Attention for Semantic Segmentation -- Considering Channel Relation within Spatial Attention for Semantic Segmentation | ✓ Link | 60.1 | | | CAA (Efficientnet-B7) | 2021-01-19 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 59.6 | | | HRNetV2 + OCR + RMI (PaddleClas pretrained) | 2019-09-24 |
Segmenter: Transformer for Semantic Segmentation | ✓ Link | 59.0 | | | Seg-L-Mask/16 | 2021-05-12 |
ResNeSt: Split-Attention Networks | ✓ Link | 58.9 | | | ResNeSt-269 | 2020-04-19 |
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective | ✓ Link | 58.6 | | | DEPICT-SA (ViT-L multi-scale) | 2024-11-05 |
ResNeSt: Split-Attention Networks | ✓ Link | 58.4 | | | ResNeSt-200 | 2020-04-19 |
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective | ✓ Link | 57.9 | | | DEPICT-SA (ViT-L single-scale) | 2024-11-05 |
CondNet: Conditional Classifier for Scene Segmentation | ✓ Link | 57 | | | CondNet(ResNest-101) | 2021-09-21 |
Efficient Self-Ensemble for Semantic Segmentation | ✓ Link | 56.6 | | | SenFormer (ResNet-101) | 2021-11-26 |
ResNeSt: Split-Attention Networks | ✓ Link | 56.5 | | | ResNeSt-101 | 2020-04-19 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 56.2 | | | OCR (HRNetV2-W48) | 2019-09-24 |
Generalized Parametric Contrastive Learning | ✓ Link | 56.2 | | | GPaCo (ResNet101) | 2022-09-26 |
CondNet: Conditional Classifier for Scene Segmentation | ✓ Link | 56.0 | | | CondNet(ResNet-101) | 2021-09-21 |
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers | ✓ Link | 55.83 | | | SETR-MLA (16, 80k, MS) | 2020-12-31 |
DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation | | 55.6 | | | DCNAS | 2020-03-26 |
Scene Segmentation with Dual Relation-aware Attention Network | ✓ Link | 55.4% | | | DRAN(ResNet-101) | 2020-08-05 |
Disentangled Non-Local Neural Networks | ✓ Link | 55.3 | | | DNL | 2020-06-11 |
Is Attention Better Than Matrix Decomposition? | ✓ Link | 55.2 | | | HamNet (ResNet-101) | 2021-09-09 |
Channelized Axial Attention for Semantic Segmentation -- Considering Channel Relation within Spatial Attention for Semantic Segmentation | ✓ Link | 55.0 | | | CAA (ResNet-101) | 2021-01-19 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 54.8 | | | OCR (ResNet-101) | 2019-09-24 |
Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings | | 54.2 | | | SIW(Segformer-B5) | 2022-02-04 |
Co-Occurrent Features in Semantic Segmentation | ✓ Link | 54.0 | | | CFNet (ResNet-101) | 2019-06-01 |
Deep High-Resolution Representation Learning for Visual Recognition | ✓ Link | 54.0 | | | CFNet (ResNet-101) | 2019-08-20 |
Deep High-Resolution Representation Learning for Visual Recognition | ✓ Link | 54 | | | HRNetV2 HRNetV2-W48 | 2019-08-20 |
Context Prior for Scene Segmentation | ✓ Link | 53.9 | | | CPN(ResNet-101) | 2020-04-03 |
Location-aware Upsampling for Semantic Segmentation | ✓ Link | 53.9 | | | LaU-regression-loss (ResNet-101) | 2019-11-13 |
Dual Graph Convolutional Network for Semantic Segmentation | ✓ Link | 53.7 | | | DGCNet (MS, ResNet-101) | 2019-09-13 |
Boundary-Aware Feature Propagation for Scene Segmentation | ✓ Link | 53.6 | | | BFP | 2019-08-31 |
Semantic Correlation Promoted Shape-Variant Context for Segmentation | ✓ Link | 53.2 | | | SVCNet (ResNet-101) | 2019-09-05 |
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation | ✓ Link | 53.1 | | | Joint Pyramid Upsampling + EncNet | 2019-03-28 |
Expectation-Maximization Attention Networks for Semantic Segmentation | ✓ Link | 53.1 | | | EMANet | 2019-07-31 |
Asymmetric Non-local Neural Networks for Semantic Segmentation | ✓ Link | 52.8 | | | Asymmetric ALNN | 2019-08-21 |
CASSOD-Net: Cascaded and Separable Structures of Dilated Convolution for Embedded Vision Systems and Applications | | 52.76 | | | CASSOD | 2021-04-29 |
Dual Attention Network for Scene Segmentation | ✓ Link | 52.6 | | | DANet (ResNet-101) | 2018-09-09 |
Scene Parsing via Integrated Classification Model and Variance-Based Regularization | ✓ Link | 52.60 | | | ICM | 2019-06-01 |
Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation | | 52.5 | | | DUpsampling | 2019-03-05 |
Context Encoding for Semantic Segmentation | ✓ Link | 51.7 | | | EncNet (ResNet-101) | 2018-03-23 |
Co-Occurrent Features in Semantic Segmentation | ✓ Link | 51.5 | | | CFNet (ResNet-50) | 2019-06-01 |
Wider or Deeper: Revisiting the ResNet Model for Visual Recognition | ✓ Link | 48.1 | | | ResNet-38 | 2016-11-30 |
Pyramid Scene Parsing Network | ✓ Link | 47.8 | | | PSPNet (ResNet-101) | 2016-12-04 |
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation | ✓ Link | 47.3 | | | RefineNet | 2016-11-20 |
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs | ✓ Link | 45.7 | | | DeepLabV2 | 2016-06-02 |
Bridging Category-level and Instance-level Semantic Image Segmentation | | 44.5 | | | VeryDeep | 2016-05-23 |
Efficient piecewise training of deep structured models for semantic segmentation | | 43.3 | | | Piecewise | 2015-04-04 |
Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation | ✓ Link | 42.6 | | | Dilated-FCN2s | 2017-07-26 |
Higher Order Conditional Random Fields in Deep Neural Networks | ✓ Link | 41.3 | | | HO CRF | 2015-11-25 |
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation | | 40.5 | | | BoxSup | 2015-03-05 |
ParseNet: Looking Wider to See Better | ✓ Link | 40.4 | | | ParseNet | 2015-06-15 |
Conditional Random Fields as Recurrent Neural Networks | ✓ Link | 39.3 | | | CRF-RNN | 2015-02-11 |
Fully Convolutional Networks for Semantic Segmentation | ✓ Link | 37.8 | | | FCN-8s | 2014-11-14 |
Convolutional Feature Masking for Joint Object and Stuff Segmentation | ✓ Link | 34.4 | | | CFM | 2014-12-03 |
Region-based semantic segmentation with end-to-end training | ✓ Link | 32.5 | 49.9 | 62.4 | RBE2E | 2016-07-26 |
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | ✓ Link | 24.7 | | | SegCLIP | 2022-11-27 |