OpenCodePapers

semantic-segmentation-on-cityscapes-val

Semantic Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemIoUFPSValidation mIoUModelNameReleaseDate
The Missing Point in Vision Transformers for Universal Image Segmentation✓ Link87.4ViT-P (InternImage-H)2025-05-26
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks✓ Link87.3587.35SERNet-Former2024-01-28
Harnessing Diffusion Models for Visual Perception with Meta Prompts✓ Link87.1MetaPrompt-SD2023-12-22
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions✓ Link87InternImage-H2022-11-10
Polarized Self-Attention: Towards High-quality Pixel-wise Regression✓ Link86.93HRNetV2-OCR+PSA2021-07-02
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions✓ Link86.4InternImage-XL2022-11-10
Hierarchical Multi-Scale Attention for Semantic Segmentation✓ Link86.3HRNet-OCR2020-05-21
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data✓ Link86.2Depth Anything2024-01-19
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link85.8OneFormer (ConvNeXt-XL, Mapillary, multi-scale)2022-11-10
Vision Transformer Adapter for Dense Predictions✓ Link85.8ViT-Adapter-L2022-05-17
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link84.98SeMask (SeMask Swin-L Mask2Former)2021-12-23
Sequential Ensembling for Semantic Segmentation84.8Sequential Ensemble (MiT-B5 + HRNet)2022-10-08
Soft labelling for semantic segmentation: Bringing coherence to label down-sampling✓ Link84.8Soft Labells (HRnet)2023-02-27
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link84.6OneFormer (ConvNeXt-XL, multi-scale)2022-11-10
Dilated Neighborhood Attention Transformer✓ Link84.5DiNAT-L (Mask2Former)2022-09-29
OneFormer: One Transformer to Rule Universal Image Segmentation✓ Link84.4OneFormer (Swin-L, multi-scale)2022-11-10
VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer84.4VPNeXt2025-02-23
VOLO: Vision Outlooker for Visual Recognition✓ Link84.3VOLO-D4 (MS, ImageNet1k pretrain)2021-06-24
Masked-attention Mask Transformer for Universal Image Segmentation✓ Link84.3Mask2Former (Swin-L)2021-12-02
Your ViT is Secretly an Image Segmentation Model✓ Link84.22584.2EoMT (DINOv2-L, single-scale, 1024x1024)2025-03-24
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers✓ Link84.0SegFormer (MiT-B5, Mapillary)2021-05-31
DDP: Diffusion Model for Dense Visual Prediction✓ Link83.9DDP (ConvNeXt-L, step-3)2023-03-30
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation✓ Link83.6HRNetV2 + OCR + RMI (PaddleClas pretrained)2019-09-24
Vision Transformers with Patch Diversification✓ Link83.6%PatchDiverse + Swin-L (multi-scale test, upernet, ImageNet22k pretrain)2021-04-26
Pixel-wise Anomaly Detection in Complex Driving Scenes✓ Link83.5SynBoost2021-03-09
Conditional Boundary Loss for Semantic Segmentation✓ Link83.4HRNetV2+OCR+CBL(ImageNet pretrained)2023-07-05
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction✓ Link83.2EfficientViT-B3 (r1184x2368)2022-05-29
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation✓ Link83.16%HRViT-b3 (SegFormer, SS)2021-11-01
Dilated SpineNet for Semantic Segmentation83.04%SpineNet-S143+ (single-scale test)2021-03-23
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation✓ Link82.81%HRViT-b2 (SegFormer, SS)2021-11-01
Fully Attentional Networks with Self-emerging Token Labeling✓ Link82.8FAN-L-Hybrid+STL2024-01-08
ResNeSt: Split-Attention Networks✓ Link82.7ResNeSt-2002020-04-19
WaveMix: A Resource-efficient Neural Network for Image Analysis✓ Link82.7WaveMix2022-05-28
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers✓ Link82.6CMX (B4)2022-03-09
WaveMix: A Resource-efficient Neural Network for Image Analysis✓ Link82.60WaveMix-256/16 (Level-4)2022-05-28
Understanding The Robustness in Vision Transformers✓ Link82.3FAN-L-Hybrid2022-04-26
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers✓ Link82.15SETR-PUP (80k, MS)2020-12-31
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation✓ Link82.0DSNet-Base(single-scale)2024-06-06
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks✓ Link81.7%EANet2021-05-05
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation✓ Link81.63%HRViT-b1 (SegFormer, SS)2021-11-01
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers✓ Link81.6CMX (B2)2022-03-09
Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance✓ Link81.54%Trans4Trans2021-08-20
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation✓ Link81.5%Panoptic-DeepLab2019-11-22
[]()81.5Soft Labells (Deeplab)
Deep High-Resolution Representation Learning for Visual Recognition✓ Link81.1HRNetV2 (HRNetV2-W48)2019-08-20
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation✓ Link81.1%Trans4PASS (Small)2022-03-02
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective✓ Link81.0DEPICT-SA (ViT-L multi-scale)2024-11-05
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation✓ Link80.6OCR (ResNet-101-FCN)2019-09-24
RepVGG: Making VGG-style ConvNets Great Again✓ Link80.57%RepVGG-B22021-01-11
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation✓ Link80.481.9DSNet(single-scale)2024-06-06
SeMask: Semantically Masked Transformers for Semantic Segmentation✓ Link80.39SeMask (SeMask Swin-L FPN)2021-12-23
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation✓ Link80.33%Auto-DeepLab-L2019-01-10
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation✓ Link80.33SML2021-07-23
Multiscale Deep Equilibrium Models✓ Link80.3%Multiscale DEQ (MDEQ-XL)2020-06-15
Deep High-Resolution Representation Learning for Visual Recognition✓ Link80.2HRNetV2 (HRNetV2-W40)2019-08-20
Pyramid Scene Parsing Network✓ Link79.7PSPNet (Dilated-ResNet-101)2016-12-04
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation✓ Link79.6DeepLabv3+ (Dilated-Xception-71)2018-02-07
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation✓ Link79.1%Trans4PASS (Tiny)2022-03-02
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective✓ Link78.8DEPICT-SA (ViT-L single-scale)2024-11-05
PointRend: Image Segmentation as Rendering✓ Link78.6SemanticFPN P2-P5 + PointRend2019-12-17
Rethinking Atrous Convolution for Semantic Image Segmentation✓ Link78.5%DeepLabv3 (Dilated-ResNet-101)2017-06-17
Representation Recycling for Streaming Video Analysis✓ Link78.21.1StreamDEQ (8 iterations)2022-04-28
Multiscale Deep Equilibrium Models✓ Link77.8%Multiscale DEQ (MDEQ-large)2020-06-15
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift✓ Link77.8HALO2023-06-19
Efficient Visual Pretraining with Contrastive Detection✓ Link77.0%DetCon_B2021-03-19
EEEA-Net: An Early Exit Evolutionary Neural Architecture Search✓ Link76.8EEEA-Net-C2 (ours)2021-08-13
WaveMix-Lite: A Resource-efficient Neural Network for Image Analysis✓ Link76.79WaveMixLite-256/162022-10-13
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images✓ Link76.41SwinMTL2024-03-15
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes✓ Link76.3672.3 (3090)CSFNet-22024-07-01
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality✓ Link76.27RepMLPNet-D2562021-12-21
Deep Residual Learning for Image Recognition✓ Link75.7Dilated-ResNet (Dilated-ResNet-101)2015-12-10
UNet++: A Nested U-Net Architecture for Medical Image Segmentation✓ Link75.5UNet++ (ResNet-101)2018-07-18
SqueezeNAS: Fast neural architecture search for faster semantic segmentation✓ Link75.2%SqueezeNAS (LAT XLarge)2019-08-05
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?✓ Link75.2ReLICv22022-01-13
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes✓ Link74.73106.1 (3090)CSFNet-12024-07-01
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation✓ Link74.7%GSCNN (ResNet-101)2019-07-12
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?✓ Link74.6BYOL2022-01-13
Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic Segmentation✓ Link74%WASPnet (ours)2019-12-06
SqueezeNAS: Fast neural architecture search for faster semantic segmentation✓ Link73.6%SqueezeNAS (LAT Large)2019-08-05
FasterSeg: Searching for Faster Real-time Semantic Segmentation✓ Link73.1%FasterSeg2019-12-23
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation✓ Link73.0%GSCNN (ResNet-50)2019-07-12
Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos72.8%Aerial-PASS (ResNet-18)2021-05-15
Real-time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-driving Images✓ Link72.5%RFNet (ResNet-18)2020-02-24
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation✓ Link72.1%ERFNet (PyTorch)2017-10-09
DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation through SwaftNet for Surrounding Sensing✓ Link72.1%SwaftNet (ResNet-18)2019-09-17
Representation Recycling for Streaming Video Analysis✓ Link71.51.9StreamDEQ (4 iterations)2022-04-28
Template-Based Automatic Search of Compact Semantic Segmentation Architectures✓ Link69.5%Template-Based NAS-arch12019-04-04
Fast-SCNN: Fast Semantic Segmentation Network✓ Link69.19Fast-SCNN + Coarse + ImageNet2019-02-12
Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation✓ Link68.48%LDFNet2018-09-24
Template-Based Automatic Search of Compact Semantic Segmentation Architectures✓ Link68.1%Template-Based NAS-arch02019-04-04
SqueezeNAS: Fast neural architecture search for faster semantic segmentation✓ Link68.0%SqueezeNAS (LAT Small)2019-08-05
ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time✓ Link65.9%ContextNet2018-05-11
DiCENet: Dimension-wise Convolutions for Efficient Networks✓ Link63.4DiCENet2019-06-08
Exploring Semantic Segmentation on the DCT Representation61.6DCT-EDANet2019-07-23
Representation Recycling for Streaming Video Analysis✓ Link57.92.9StreamDEQ (2 iterations)2022-04-28
Representation Recycling for Streaming Video Analysis✓ Link45.54.3StreamDEQ (1 iterations)2022-04-28
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation✓ Link42.4MRFP+(Ours) Resnet502023-11-30
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation✓ Link34.66Resnet502023-11-30
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers✓ Link76.2SegFormer-B02021-05-31