Strong but simple: A Baseline for Domain Generalized Dense Perception by CLIP-based Transfer Learning | ✓ Link | 86.4 | | VLTSeg | 2023-12-04 |
Harnessing Diffusion Models for Visual Perception with Meta Prompts | ✓ Link | 86.2 | | MetaPrompt-SD | 2023-12-22 |
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions | ✓ Link | 86.1% | | InternImage-H | 2022-11-10 |
HS3: Learning with Proper Task Complexity in Hierarchically Supervised Semantic Segmentation | | 85.8% | | HS3-Fuse | 2021-11-03 |
InverseForm: A Loss Function for Structured Boundary-Aware Segmentation | ✓ Link | 85.6% | | InverseForm | 2021-04-06 |
Vision Transformer Adapter for Dense Predictions | ✓ Link | 85.2% | | ViT-Adapter-L (Mask2Former, BEiT pretrain) | 2022-05-17 |
SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion Networks | ✓ Link | 84.83 | | SERNet-Former | 2024-01-28 |
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data | ✓ Link | 84.8% | | Depth Anything | 2024-01-19 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 84.5% | | HRNetV2 + OCR + | 2019-09-24 |
EfficientPS: Efficient Panoptic Segmentation | ✓ Link | 84.21% | | EfficientPS | 2020-04-05 |
Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation | ✓ Link | 84.2% | | Panoptic-DeepLab | 2019-11-22 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 83.7% | | HRNetV2 + OCR (w/ ASP) | 2019-09-24 |
DCNAS: Densely Connected Neural Architecture Search for Semantic Image Segmentation | | 83.6% | | DCNAS(coarse + Mapillary) | 2020-03-26 |
Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond | ✓ Link | 83.6% | | Euclidean Frank-Wolfe CRFs (backbone: DeepLabv3+)(coarse) | 2021-10-27 |
Global Aggregation then Local Distribution in Fully Convolutional Networks | ✓ Link | 83.3% | | GALDNet(+Mapillary)++ | 2019-09-16 |
ResNeSt: Split-Attention Networks | ✓ Link | 83.3% | | ResNeSt200 (Mapillary) | 2020-04-19 |
Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks | ✓ Link | 83.2% | | HANet (Height-driven Attention Networks by LGE A&B)(coarse) | 2020-03-11 |
kMaX-DeepLab: k-means Mask Transformer | ✓ Link | 83.2% | | kMaX-DeepLab (ConvNeXt-L, fine only) | 2022-07-08 |
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers | ✓ Link | 83.1% | | SegFormer (MiT-B5, Mapillary) | 2021-05-31 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 83.0% | | OCR (HRNetV2-W48, coarse) | 2019-09-24 |
Multi Receptive Field Network for Semantic Segmentation | | 83.0% | | MRFM(coarse) | 2020-11-17 |
Disentangled Non-Local Neural Networks | ✓ Link | 83% | | DNL (coarse) | 2020-06-11 |
Scene Segmentation with Dual Relation-aware Attention Network | ✓ Link | 82.9% | | DRAN(ResNet-101) WITH ONLY FINE ANNOTATED DATA | 2020-08-05 |
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation | ✓ Link | 82.8% | | Gated-SCNN | 2019-07-12 |
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction | ✓ Link | 82.7% | | Dense Prediction Cell | 2018-09-11 |
Channelized Axial Attention for Semantic Segmentation -- Considering Channel Relation within Spatial Attention for Semantic Segmentation | ✓ Link | 82.6% | | CAA (ResNet-101) | 2021-01-19 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 82.4% | | OCR (ResNet-101, coarse) | 2019-09-24 |
Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes | ✓ Link | 82.4% | | DDRNet-39 1.5x | 2021-01-15 |
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation | ✓ Link | 82.3% | | SSMA | 2018-08-11 |
GFF: Gated Fully Fusion for Semantic Segmentation | ✓ Link | 82.3% | | Gated Fully Fusion | 2019-04-03 |
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation | ✓ Link | 82.1% | | Auto-DeepLab-L | 2019-01-10 |
Dual Graph Convolutional Network for Semantic Segmentation | ✓ Link | 82% | | DGCNet (ResNet-101) | 2019-09-13 |
Strip Pooling: Rethinking Spatial Pooling for Scene Parsing | ✓ Link | 82.0% | | SPNet (ResNet-101) | 2020-03-30 |
Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation | ✓ Link | 81.8% | | OCR (ResNet-101) | 2019-09-24 |
Joint Semantic Segmentation and Boundary Detection using Iterative Pyramid Contexts | | 81.8 | | RPCNet | 2020-04-16 |
OCNet: Object Context Network for Scene Parsing | ✓ Link | 81.7% | | OCNet | 2018-09-04 |
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers | ✓ Link | 81.64% | | SETR-PUP++ | 2020-12-31 |
High-Resolution Representations for Labeling Pixels and Regions | ✓ Link | 81.6% | | HRNet (HRNetV2-W48) | 2019-04-09 |
Deep High-Resolution Representation Learning for Visual Recognition | ✓ Link | 81.6% | | HRNetV2 (train+val) | 2019-08-20 |
Dual Attention Network for Scene Segmentation | ✓ Link | 81.5% | | DANet (ResNet-101) | 2018-09-09 |
CCNet: Criss-Cross Attention for Semantic Segmentation | ✓ Link | 81.4% | | CCNet | 2018-11-28 |
Boundary-Aware Feature Propagation for Scene Segmentation | ✓ Link | 81.4% | | BFP | 2019-08-31 |
Rethinking Atrous Convolution for Semantic Image Segmentation | ✓ Link | 81.3% | | DeepLabv3 (ResNet-101, coarse) | 2017-06-17 |
Context Prior for Scene Segmentation | ✓ Link | 81.3% | | CPN(ResNet-101) | 2020-04-03 |
Asymmetric Non-local Neural Networks for Semantic Segmentation | ✓ Link | 81.3% | | Asymmetric ALNN | 2019-08-21 |
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation | ✓ Link | 81.24% | | AdapNet++ | 2018-08-11 |
Semantic Correlation Promoted Shape-Variant Context for Segmentation | ✓ Link | 81.0% | | SVCNet (ResNet-101) | 2019-09-05 |
Densely connected multidilated convolutional networks for dense prediction tasks | ✓ Link | 80.8% | | D3Net-L | 2020-11-21 |
DenseASPP for Semantic Segmentation in Street Scenes | ✓ Link | 80.6% | | DenseASPP (DenseNet-161) | 2018-06-01 |
Learning a Discriminative Feature Network for Semantic Segmentation | ✓ Link | 80.3% | | Smooth Network with Channel Attention Block | 2018-04-25 |
Pyramid Scene Parsing Network | ✓ Link | 80.2% | | PSPNet++ | 2016-12-04 |
PSANet: Point-wise Spatial Attention Network for Scene Parsing | ✓ Link | 80.1% | | PSANet (ResNet-101) | 2018-09-01 |
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis | ✓ Link | 80.09% | | ESANet-R34-NBt1D | 2020-11-13 |
Resolution-Aware Design of Atrous Rates for Semantic Segmentation Networks | | 79.9% | | DeepLabV3 with R-101 | 2023-07-26 |
Learning a Discriminative Feature Network for Semantic Segmentation | ✓ Link | 79.3% | | DFN (ResNet-101) | 2018-04-25 |
Adaptive Affinity Fields for Semantic Segmentation | ✓ Link | 79.1% | | AAF (ResNet-101) | 2018-03-27 |
ShelfNet for Fast Semantic Segmentation | ✓ Link | 79.0% | | ShelfNet-34 | 2018-11-27 |
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation | ✓ Link | 78.9% | | BiSeNet (ResNet-101) | 2018-08-02 |
Wider or Deeper: Revisiting the ResNet Model for Visual Recognition | ✓ Link | 78.4% | | ResNet-38 | 2016-11-30 |
Pyramid Scene Parsing Network | ✓ Link | 78.4% | | PSPNet | 2016-12-04 |
Recurrent Scene Parsing with Perspective Understanding in the Loop | ✓ Link | 78.2% | | DepthSeg (ResNet-101) | 2017-05-20 |
Dynamic-structured Semantic Propagation Network | | 77.8% | | DSSPN (ResNet-101) | 2018-03-16 |
Understanding Convolution for Semantic Segmentation | ✓ Link | 77.6% | | DUC-HDC (ResNet-101) | 2017-02-27 |
Semantic-Aware Generation for Self-Supervised Visual Representation Learning | ✓ Link | 76.9 | | SaGe | 2021-11-25 |
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images | ✓ Link | 76.41% | | SwinMTL | 2024-03-15 |
In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images | ✓ Link | 75.5% | | SwiftNetRN-18 | 2019-03-20 |
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation | ✓ Link | 73.6% | | RefineNet (ResNet-101) | 2016-11-20 |
Searching for MobileNetV3 | ✓ Link | 72.6% | | MobileNet V3-Large 1.0 | 2019-05-06 |
SqueezeNAS: Fast neural architecture search for faster semantic segmentation | ✓ Link | 72.5% | | SqueezeNAS (LAT Large) | 2019-08-05 |
Semantic Segmentation With Multi Scale Spatial Attention For Self Driving Cars | | 72.4% | | Multi Scale Spatial Attention | 2020-06-30 |
Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes | ✓ Link | 71.8% | | FRRN | 2016-11-24 |
Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation | ✓ Link | 71.8% | | LRR-4x | 2016-05-08 |
Efficient piecewise training of deep structured models for semantic segmentation | | 71.6% | | Context | 2015-04-04 |
FasterSeg: Searching for Faster Real-time Semantic Segmentation | ✓ Link | 71.5% | | FasterSeg | 2019-12-23 |
DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation | ✓ Link | 71.3% | | DFANet A | 2019-04-03 |
Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation | ✓ Link | 71.3 | | LDFNet | 2018-09-24 |
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation | ✓ Link | 70.75% | 88.29 | LightSeg-DarkNet19 | 2019-12-13 |
ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation | ✓ Link | 70.7% | | ESNet | 2019-06-24 |
ICNet for Real-Time Semantic Segmentation on High-Resolution Images | ✓ Link | 70.6% | | ICNet | 2017-04-27 |
LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation | ✓ Link | 70.6% | | LEDNet | 2019-05-07 |
Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic Segmentation | ✓ Link | 70.5% | | WASPnet (ours) | 2019-12-06 |
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs | ✓ Link | 70.4% | | DeepLab-CRF (ResNet-101) | 2016-06-02 |
ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation | ✓ Link | 69.8% | | ERFNet (PyTorch) | 2017-10-09 |
Fast-SCNN: Fast Semantic Segmentation Network | ✓ Link | 68% | | Fast-SCNN | 2019-02-12 |
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation | ✓ Link | 67.81% | 86.79 | LightSeg-MobileNet | 2019-12-13 |
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation | ✓ Link | 67.81% | | LiteSeg-MobileNet | 2019-12-13 |
Template-Based Automatic Search of Compact Semantic Segmentation Architectures | ✓ Link | 67.8% | | Template-Based NAS-arch1 | 2019-04-04 |
Template-Based Automatic Search of Compact Semantic Segmentation Architectures | ✓ Link | 67.7% | | Template-Based NAS-arch0 | 2019-04-04 |
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation | ✓ Link | 67.3 | | EDANet | 2018-09-17 |
Multi-Scale Context Aggregation by Dilated Convolutions | ✓ Link | 67.1% | | Dilation10 | 2015-11-23 |
Semantic Image Segmentation via Deep Parsing Network | ✓ Link | 66.8% | | DPN | 2015-09-09 |
SqueezeNAS: Fast neural architecture search for faster semantic segmentation | ✓ Link | 66.8% | | SqueezeNAS (LAT Small) | 2019-08-05 |
SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder | ✓ Link | 66.5% | | SINet | 2019-11-20 |
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network | ✓ Link | 66.2% | | ESPNetv2 | 2018-11-28 |
Fully Convolutional Networks for Semantic Segmentation | ✓ Link | 65.3% | | FCN | 2016-05-20 |
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation | ✓ Link | 65.17% | 85.39 | LightSeg-ShuffleNet | 2019-12-13 |
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation | ✓ Link | 65.17% | | LiteSeg-ShuffleNet | 2019-12-13 |
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs | ✓ Link | 63.1% | | DeepLab | 2014-12-22 |
The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks | ✓ Link | 63.06% | | ENet + Lovász-Softmax | 2017-05-24 |
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation | ✓ Link | 60.3% | | ESPNet | 2018-03-19 |
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation | ✓ Link | 58.3% | | ENet | 2016-06-07 |
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation | ✓ Link | 57.0% | | SegNet | 2015-11-02 |
The Ikshana Hypothesis of Human Scene Understanding | ✓ Link | 54.82% | 82.22% | IkshanaNet-1 | 2021-01-21 |
The Ikshana Hypothesis of Human Scene Understanding | ✓ Link | 45.02% | 76.73% | IkshanaNet-2 | 2021-01-21 |
The Ikshana Hypothesis of Human Scene Understanding | ✓ Link | 42.07% | 75.61% | IkshanaNet-3 | 2021-01-21 |