OpenCodePapers

semantic-segmentation-on-ade20k-val

Semantic Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemIoUPixel AccuracyModelNameReleaseDate
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks✓ Link62.8BEiT-32022-08-22
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions)✓ Link62.1ViT-CoMer2024-03-13
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale✓ Link61.5EVA2022-11-14
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation✓ Link61.4FD-SwinV2-G2022-05-27
Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation✓ Link60.8MaskDINO-SwinL2022-06-06
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link60.8OneFormer (InternImage-H, emb_dim=256, multi-scale, 896x896)2022-11-10
Vision Transformer Adapter for Dense Predictions✓ Link60.5ViT-Adapter-L (Mask2Former, BEiT pretrain)2022-05-17
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks✓ Link59.35SERNet-Former_v22024-01-28
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link58.6OneFormer (DiNAT-L, multi-scale, 896x896)2022-11-10
Vision Transformer Adapter for Dense Predictions✓ Link58.4ViT-Adapter-L (UperNet, BEiT pretrain)2022-05-17
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link58.4OneFormer (DiNAT-L, multi-scale, 640x640)2022-11-10
Representation Separation for Semantic Segmentation with Vision Transformers58.4RSSeg-ViT-L(BEiT pretrain)2022-12-28
Your ViT is Secretly an Image Segmentation Model✓ Link58.4EoMT (DINOv2-L, single-scale, 512x512)2025-03-24
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link58.3OneFormer (Swin-L, multi-scale, 896x896)2022-11-10
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link58.2SeMask (SeMask Swin-L FaPN-Mask2Former)2021-12-23
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link58.2SeMask (SeMask Swin-L MSFaPN-Mask2Former)2021-12-23
Dilated Neighborhood Attention Transformer✓ Link58.1DiNAT-L (Mask2Former)2022-09-29
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link57.7Mask2Former (Swin-L-FaPN, multiscale)2021-12-02
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link57.7OneFormer (Swin-L, multi-scale, 640x640)2022-11-10
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link57.5SeMask (SeMask Swin-L Mask2Former)2021-12-23
Efficient Self-Ensemble for Semantic Segmentation✓ Link57.1SenFormer (BEiT-L)2021-11-26
BEiT: BERT Pre-Training of Image Transformers✓ Link57.0BEiT-L (ViT+UperNet, ImageNet-22k pretrain)2021-06-15
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link57.0SeMask (SeMask Swin-L MSFaPN-Mask2Former, single-scale)2021-12-23
FaPN: Feature-aligned Pyramid Network for Dense Image Prediction✓ Link56.7FaPN (MaskFormer, Swin-L, ImageNet-22k pretrain)2021-08-16
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link56.4Mask2Former (Swin-L-FaPN)2021-12-02
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link56.2SeMask (SeMask Swin-L MaskFormer)2021-12-23
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows✓ Link55.7CSWin-L (UperNet, ImageNet-22k pretrain)2021-07-01
Per-Pixel Classification is Not All You Need for Semantic Segmentation✓ Link55.6MaskFormer (Swin-L, ImageNet-22k pretrain)2021-07-13
DeiT III: Revenge of the ViT✓ Link55.6DeiT-L2022-04-14
Focal Self-attention for Local-Global Interactions in Vision Transformers✓ Link55.4Focal-L (UperNet, ImageNet-22k pretrain)2021-07-01
SegViT: Semantic Segmentation with Plain Vision Transformers✓ Link55.2SegViT ViT-Large2022-10-12
Vision Transformers with Patch Diversification✓ Link54.4%PatchDiverse + Swin-L (multi-scale test, upernet, ImageNet22k pretrain)2021-04-26
K-Net: Towards Unified Image Segmentation✓ Link54.3K-Net2021-06-28
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective✓ Link54.3DEPICT-SA (ViT-L 640x640 multi-scale)2024-11-05
Efficient Self-Ensemble for Semantic Segmentation✓ Link54.2SenFormer (Swin-L)2021-11-26
DeiT III: Revenge of the ViT✓ Link54.1DeiT-B2022-04-14
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers✓ Link53.8MixMIM-L2022-05-26
Segmenter: Transformer for Semantic Segmentation✓ Link53.63Seg-L-Mask/16 (MS, ViT-L)2021-05-12
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows✓ Link53.5Swin-L (UperNet, ImageNet-22k pretrain)2021-03-25
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link53.5SeMask (SeMask Swin-L FPN)2021-12-23
Augmenting Convolutional networks with attention-based aggregation✓ Link52.9PatchConvNet-L120 (UperNet)2021-12-27
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective✓ Link52.9DEPICT-SA (ViT-L 640x640 single-scale)2024-11-05
Augmenting Convolutional networks with attention-based aggregation✓ Link52.8PatchConvNet-B120 (UperNet)2021-12-27
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers✓ Link51.8SegFormer-B5(MS, 87M #Params, ImageNet-1K pretrain)2021-05-31
Is Attention Better Than Matrix Decomposition?✓ Link51.5Light-Ham (VAN-Huge, 61M, IN-1k, MS)2021-09-09
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention✓ Link51.4%84.0%CrossFormer (ImageNet1k-pretrain, UPerNet, multi-scale test)2021-07-31
Augmenting Convolutional networks with attention-based aggregation✓ Link51.1PatchConvNet-B60 (UperNet)2021-12-27
Is Attention Better Than Matrix Decomposition?✓ Link51.0Light-Ham (VAN-Large, 46M, IN-1k, MS)2021-09-09
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer✓ Link50.5UperNet Shuffle-B2021-06-07
ELSA: Enhanced Local Self-Attention for Vision Transformer✓ Link50.3ELSA-Swin-S2021-12-23
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers✓ Link50.3MixMIM-B2022-05-26
Twins: Revisiting the Design of Spatial Attention in Vision Transformers✓ Link50.2Twins-SVT-L (UperNet, ImageNet-1k pretrain)2021-04-28
Segmenter: Transformer for Semantic Segmentation✓ Link50.0Seg-B-Mask/16 (MS, ViT-B)2021-05-12
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows✓ Link49.7Swin-B (UperNet, ImageNet-1k pretrain)2021-03-25
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window49.6983.43gSwin-S2022-08-24
Segmenter: Transformer for Semantic Segmentation✓ Link49.6183.37Seg-B/8 (MS, ViT-B)2021-05-12
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer✓ Link49.6UperNet Shuffle-S2021-06-07
Is Attention Better Than Matrix Decomposition?✓ Link49.6Light-Ham (VAN-Base, 27M, IN-1k, MS)2021-09-09
Augmenting Convolutional networks with attention-based aggregation✓ Link49.3PatchConvNet-S60 (UperNet)2021-12-27
Vision Transformers for Dense Prediction✓ Link49.0283.11DPT-Hybrid2021-03-24
DaViT: Dual Attention Vision Transformers✓ Link48.8DaViT-S (UperNet)2022-04-07
ResNeSt: Split-Attention Networks✓ Link48.36ResNeSt-2002020-04-19
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation✓ Link47.98HRNetV2 + OCR + RMI (PaddleClas pretrained)2019-09-24
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window47.6382.60gSwin-T2022-08-24
ResNeSt: Split-Attention Networks✓ Link47.60ResNeSt-2692020-04-19
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer✓ Link47.6UperNet Shuffle-T2021-06-07
DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation47.12DCNAS2020-03-26
ResNeSt: Split-Attention Networks✓ Link46.91ResNeSt-1012020-04-19
[]()46.9Seg-S-Mask/16 (MS, ViT-S)
Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields✓ Link46.41Swin-S (RPE w/ GAB)2023-05-08
DaViT: Dual Attention Vision Transformers✓ Link46.3DaViT-B (UperNet)2022-04-07
Context Prior for Scene Segmentation✓ Link46.27CPN(ResNet-101)2020-04-03
MultiMAE: Multi-modal Multi-task Masked Autoencoders✓ Link46.2MultiMAE (ViT-B)2022-04-04
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition✓ Link45.9982.49PyConvSegNet-1522020-06-20
Disentangled Non-Local Neural Networks✓ Link45.97DNL2020-06-11
CTNet: Context-based Tandem Network for Semantic Segmentation✓ Link45.94CTNet2021-04-20
Adaptive Context Network for Scene Parsing45.90ACNet (ResNet-101)2019-11-05
Adaptive Context Network for Scene Parsing45.90ACNet(ResNet-101)2019-11-05
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation✓ Link45.66OCR (HRNetV2-W48)2019-09-24
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks✓ Link45.33EANet (ResNet-101)2021-05-05
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation✓ Link45.28OCR (ResNet-101)2019-09-24
Asymmetric Non-local Neural Networks for Semantic Segmentation✓ Link45.24Asymmetric ALNN2019-08-21
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window45.0781.79gSwin-VT2022-08-24
Location-aware Upsampling for Semantic Segmentation✓ Link45.02LaU-regression-loss2019-11-13
Context Encoding for Semantic Segmentation✓ Link44.65EncNet (ResNet-101)2018-03-23
Symbolic Graph Reasoning Meets Convolutions✓ Link44.32SGR (ResNet-101)2018-12-01
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation✓ Link43.9881.72Auto-DeepLab-L2019-01-10
PSANet: Point-wise Spatial Attention Network for Scene Parsing✓ Link43.77PSANet (ResNet-101)2018-09-01
Dynamic-structured Semantic Propagation Network43.68DSSPN (ResNet-101)2018-03-16
Pyramid Scene Parsing Network✓ Link43.51%PSPNet (ResNet-152)2016-12-04
Pyramid Scene Parsing Network✓ Link43.29%PSPNet (ResNet-101)2016-12-04
High-Resolution Representations for Labeling Pixels and Regions✓ Link42.99HRNetV2 (HRNetV2-W48)2019-04-09
Unified Perceptual Parsing for Scene Understanding✓ Link42.66UperNet (ResNet-101)2018-07-26
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation✓ Link40.70RefineNet (ResNet-152)2016-11-20
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation✓ Link40.20RefineNet (ResNet-101)2016-11-20