semantic-segmentation-on-cityscapes-val

Semantic Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mIoU	FPS	Validation mIoU	ModelName	ReleaseDate
The Missing Point in Vision Transformers for Universal Image Segmentation	✓ Link	87.4			ViT-P (InternImage-H)	2025-05-26
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks	✓ Link	87.35		87.35	SERNet-Former	2024-01-28
Harnessing Diffusion Models for Visual Perception with Meta Prompts	✓ Link	87.1			MetaPrompt-SD	2023-12-22
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	87			InternImage-H	2022-11-10
Polarized Self-Attention: Towards High-quality Pixel-wise Regression	✓ Link	86.93			HRNetV2-OCR+PSA	2021-07-02
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions	✓ Link	86.4			InternImage-XL	2022-11-10
Hierarchical Multi-Scale Attention for Semantic Segmentation	✓ Link	86.3			HRNet-OCR	2020-05-21
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data	✓ Link	86.2			Depth Anything	2024-01-19
Vision Transformer Adapter for Dense Predictions	✓ Link	85.8			ViT-Adapter-L	2022-05-17
OneFormer: One Transformer to Rule Universal Image Segmentation	✓ Link	85.8			OneFormer (ConvNeXt-XL, Mapillary, multi-scale)	2022-11-10
SeMask: Semantically Masked Transformers for Semantic Segmentation	✓ Link	84.98			SeMask (SeMask Swin-L Mask2Former)	2021-12-23
Sequential Ensembling for Semantic Segmentation		84.8			Sequential Ensemble (MiT-B5 + HRNet)	2022-10-08
Soft labelling for semantic segmentation: Bringing coherence to label down-sampling	✓ Link	84.8			Soft Labells (HRnet)	2023-02-27
OneFormer: One Transformer to Rule Universal Image Segmentation	✓ Link	84.6			OneFormer (ConvNeXt-XL, multi-scale)	2022-11-10
Dilated Neighborhood Attention Transformer	✓ Link	84.5			DiNAT-L (Mask2Former)	2022-09-29
OneFormer: One Transformer to Rule Universal Image Segmentation	✓ Link	84.4			OneFormer (Swin-L, multi-scale)	2022-11-10
VPNeXt -- Rethinking Dense Decoding for Plain Vision Transformer		84.4			VPNeXt	2025-02-23
VOLO: Vision Outlooker for Visual Recognition	✓ Link	84.3			VOLO-D4 (MS, ImageNet1k pretrain)	2021-06-24
Masked-attention Mask Transformer for Universal Image Segmentation	✓ Link	84.3			Mask2Former (Swin-L)	2021-12-02
Your ViT is Secretly an Image Segmentation Model	✓ Link	84.2	25	84.2	EoMT (DINOv2-L, single-scale, 1024x1024)	2025-03-24
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	✓ Link	84.0			SegFormer (MiT-B5, Mapillary)	2021-05-31
DDP: Diffusion Model for Dense Visual Prediction	✓ Link	83.9			DDP (ConvNeXt-L, step-3)	2023-03-30
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation	✓ Link	83.6			HRNetV2 + OCR + RMI (PaddleClas pretrained)	2019-09-24
Vision Transformers with Patch Diversification	✓ Link	83.6%			PatchDiverse + Swin-L (multi-scale test, upernet, ImageNet22k pretrain)	2021-04-26
Pixel-wise Anomaly Detection in Complex Driving Scenes	✓ Link	83.5			SynBoost	2021-03-09
Conditional Boundary Loss for Semantic Segmentation	✓ Link	83.4			HRNetV2+OCR+CBL(ImageNet pretrained)	2023-07-05
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction	✓ Link	83.2			EfficientViT-B3 (r1184x2368)	2022-05-29
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation	✓ Link	83.16%			HRViT-b3 (SegFormer, SS)	2021-11-01
Dilated SpineNet for Semantic Segmentation		83.04%			SpineNet-S143+ (single-scale test)	2021-03-23
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation	✓ Link	82.81%			HRViT-b2 (SegFormer, SS)	2021-11-01
Fully Attentional Networks with Self-emerging Token Labeling	✓ Link	82.8			FAN-L-Hybrid+STL	2024-01-08
ResNeSt: Split-Attention Networks	✓ Link	82.7			ResNeSt-200	2020-04-19
WaveMix: A Resource-efficient Neural Network for Image Analysis	✓ Link	82.7			WaveMix	2022-05-28
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers	✓ Link	82.6			CMX (B4)	2022-03-09
WaveMix: A Resource-efficient Neural Network for Image Analysis	✓ Link	82.60			WaveMix-256/16 (Level-4)	2022-05-28
Understanding The Robustness in Vision Transformers	✓ Link	82.3			FAN-L-Hybrid	2022-04-26
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers	✓ Link	82.15			SETR-PUP (80k, MS)	2020-12-31
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation	✓ Link	82.0			DSNet-Base(single-scale)	2024-06-06
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks	✓ Link	81.7%			EANet	2021-05-05
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation	✓ Link	81.63%			HRViT-b1 (SegFormer, SS)	2021-11-01
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers	✓ Link	81.6			CMX (B2)	2022-03-09
Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance	✓ Link	81.54%			Trans4Trans	2021-08-20
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation	✓ Link	81.5%			Panoptic-DeepLab	2019-11-22
[]()		81.5			Soft Labells (Deeplab)
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	81.1			HRNetV2 (HRNetV2-W48)	2019-08-20
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation	✓ Link	81.1%			Trans4PASS (Small)	2022-03-02
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective	✓ Link	81.0			DEPICT-SA (ViT-L multi-scale)	2024-11-05
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation	✓ Link	80.6			OCR (ResNet-101-FCN)	2019-09-24
RepVGG: Making VGG-style ConvNets Great Again	✓ Link	80.57%			RepVGG-B2	2021-01-11
DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation	✓ Link	80.4	81.9		DSNet(single-scale)	2024-06-06
SeMask: Semantically Masked Transformers for Semantic Segmentation	✓ Link	80.39			SeMask (SeMask Swin-L FPN)	2021-12-23
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation	✓ Link	80.33%			Auto-DeepLab-L	2019-01-10
Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation	✓ Link	80.33			SML	2021-07-23
Multiscale Deep Equilibrium Models	✓ Link	80.3%			Multiscale DEQ (MDEQ-XL)	2020-06-15
Deep High-Resolution Representation Learning for Visual Recognition	✓ Link	80.2			HRNetV2 (HRNetV2-W40)	2019-08-20
Pyramid Scene Parsing Network	✓ Link	79.7			PSPNet (Dilated-ResNet-101)	2016-12-04
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation	✓ Link	79.6			DeepLabv3+ (Dilated-Xception-71)	2018-02-07
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation	✓ Link	79.1%			Trans4PASS (Tiny)	2022-03-02
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective	✓ Link	78.8			DEPICT-SA (ViT-L single-scale)	2024-11-05
PointRend: Image Segmentation as Rendering	✓ Link	78.6			SemanticFPN P2-P5 + PointRend	2019-12-17
Rethinking Atrous Convolution for Semantic Image Segmentation	✓ Link	78.5%			DeepLabv3 (Dilated-ResNet-101)	2017-06-17
Representation Recycling for Streaming Video Analysis	✓ Link	78.2	1.1		StreamDEQ (8 iterations)	2022-04-28
Multiscale Deep Equilibrium Models	✓ Link	77.8%			Multiscale DEQ (MDEQ-large)	2020-06-15
Hyperbolic Active Learning for Semantic Segmentation under Domain Shift	✓ Link	77.8			HALO	2023-06-19
Efficient Visual Pretraining with Contrastive Detection	✓ Link	77.0%			DetCon_B	2021-03-19
EEEA-Net: An Early Exit Evolutionary Neural Architecture Search	✓ Link	76.8			EEEA-Net-C2 (ours)	2021-08-13
WaveMix-Lite: A Resource-efficient Neural Network for Image Analysis	✓ Link	76.79			WaveMixLite-256/16	2022-10-13
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images	✓ Link	76.41			SwinMTL	2024-03-15
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes	✓ Link	76.36	72.3 (3090)		CSFNet-2	2024-07-01
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality	✓ Link	76.27			RepMLPNet-D256	2021-12-21
Deep Residual Learning for Image Recognition	✓ Link	75.7			Dilated-ResNet (Dilated-ResNet-101)	2015-12-10
UNet++: A Nested U-Net Architecture for Medical Image Segmentation	✓ Link	75.5			UNet++ (ResNet-101)	2018-07-18
SqueezeNAS: Fast neural architecture search for faster semantic segmentation	✓ Link	75.2%			SqueezeNAS (LAT XLarge)	2019-08-05
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?	✓ Link	75.2			ReLICv2	2022-01-13
CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes	✓ Link	74.73	106.1 (3090)		CSFNet-1	2024-07-01
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation	✓ Link	74.7%			GSCNN (ResNet-101)	2019-07-12
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?	✓ Link	74.6			BYOL	2022-01-13
Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic Segmentation	✓ Link	74%			WASPnet (ours)	2019-12-06
SqueezeNAS: Fast neural architecture search for faster semantic segmentation	✓ Link	73.6%			SqueezeNAS (LAT Large)	2019-08-05
FasterSeg: Searching for Faster Real-time Semantic Segmentation	✓ Link	73.1%			FasterSeg	2019-12-23
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation	✓ Link	73.0%			GSCNN (ResNet-50)	2019-07-12
Aerial-PASS: Panoramic Annular Scene Segmentation in Drone Videos		72.8%			Aerial-PASS (ResNet-18)	2021-05-15
Real-time Fusion Network for RGB-D Semantic Segmentation Incorporating Unexpected Obstacle Detection for Road-driving Images	✓ Link	72.5%			RFNet (ResNet-18)	2020-02-24
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation	✓ Link	72.1%			ERFNet (PyTorch)	2017-10-09
DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation through SwaftNet for Surrounding Sensing	✓ Link	72.1%			SwaftNet (ResNet-18)	2019-09-17
Representation Recycling for Streaming Video Analysis	✓ Link	71.5	1.9		StreamDEQ (4 iterations)	2022-04-28
Template-Based Automatic Search of Compact Semantic Segmentation Architectures	✓ Link	69.5%			Template-Based NAS-arch1	2019-04-04
Fast-SCNN: Fast Semantic Segmentation Network	✓ Link	69.19			Fast-SCNN + Coarse + ImageNet	2019-02-12
Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation	✓ Link	68.48%			LDFNet	2018-09-24
Template-Based Automatic Search of Compact Semantic Segmentation Architectures	✓ Link	68.1%			Template-Based NAS-arch0	2019-04-04
SqueezeNAS: Fast neural architecture search for faster semantic segmentation	✓ Link	68.0%			SqueezeNAS (LAT Small)	2019-08-05
ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time	✓ Link	65.9%			ContextNet	2018-05-11
DiCENet: Dimension-wise Convolutions for Efficient Networks	✓ Link	63.4			DiCENet	2019-06-08
Exploring Semantic Segmentation on the DCT Representation		61.6			DCT-EDANet	2019-07-23
Representation Recycling for Streaming Video Analysis	✓ Link	57.9	2.9		StreamDEQ (2 iterations)	2022-04-28
Representation Recycling for Streaming Video Analysis	✓ Link	45.5	4.3		StreamDEQ (1 iterations)	2022-04-28
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation	✓ Link	42.4			MRFP+(Ours) Resnet50	2023-11-30
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation	✓ Link	34.66			Resnet50	2023-11-30
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	✓ Link			76.2	SegFormer-B0	2021-05-31

OpenCodePapers

semantic-segmentation-on-cityscapes-val