OpenCodePapers

semantic-segmentation-on-nyu-depth-v2

Semantic Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeMean IoUMean AccuracyModelNameReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning63.6OmniVec22024-01-01
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer61.5DiffusionMMS (DAT++-S)2024-09-23
DepthMatch: Semi-Supervised RGB-D Scene Parsing through Depth-Guided Regularization61.4DepthMatch (DINOv2-S)2025-05-26
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer✓ Link60.9GeminiFusion (Swin-Large)2024-06-03
OmniVec: Learning robust representations with cross modal sharing60.8OmniVec2023-11-07
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer✓ Link60.2GeminiFusion (Swin-Large)2024-06-03
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning✓ Link59.3DPLNet2023-12-01
HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework✓ Link59.3%HDBFormer2025-04-18
PanopticNDT: Efficient and Robust Panoptic Mapping✓ Link59.02EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned)2023-09-24
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation✓ Link58.4%DFormerv2-L2025-04-07
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images✓ Link58.14%SwinMTL2024-03-15
PolyMaX: General Dense Prediction with Mask Transformer✓ Link58.08%PolyMaX(ConvNeXt-L)2023-11-09
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation✓ Link57.8%HSPFormer(PVT v2-B4)2025-01-16
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer✓ Link57.7GeminiFusion (MiT-B5)2024-06-03
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation✓ Link57.7%DFormerv2-B2025-04-07
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation✓ Link57.2%DFormer-L2023-09-18
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers✓ Link56.9%CMX (B5)2022-03-09
Delivering Arbitrary-Modal Semantic Segmentation✓ Link56.9%CMNeXt (B4)2023-03-02
Omnivore: A Single Model for Many Visual Modalities✓ Link56.8%OMNIVORE (Swin-L, finetuned)2022-01-20
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer✓ Link56.8GeminiFusion (MiT-B3)2024-06-03
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers✓ Link56.3%CMX (B4)2022-03-09
MultiMAE: Multi-modal Multi-task Masked Autoencoders✓ Link56.0%MultiMAE (ViT-B)2022-04-04
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation✓ Link56.0%DFormerv2-S2025-04-07
Understanding Dark Scenes by Contrasting Multi-Modal Observations✓ Link55.8%SMMCL (SegNeXt-B)2023-08-23
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation✓ Link55.6%DFormer-B2023-09-18
ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer✓ Link55.5%ComPtr (Swin-B)2023-07-23
AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation✓ Link55.3%AsymFormer2023-09-25
Omnivore: A Single Model for Many Visual Modalities✓ Link55.1%OMNIVORE (Swin-B, finetuned)2022-01-20
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion✓ Link55.068.8HAPNet2024-04-04
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers✓ Link54.4%CMX (B2)2022-03-09
Multimodal Token Fusion for Vision Transformers✓ Link54.2%TokenFusion (S)2022-04-19
Understanding Dark Scenes by Contrasting Multi-Modal Observations✓ Link53.7%SMMCL (SegFormer-B2)2023-08-23
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation✓ Link53.6%DFormer-S2023-09-18
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding✓ Link53.56%InvPT2022-03-15
HS3: Learning with Proper Task Complexity in Hierarchically Supervised Semantic Segmentation53.5%HS3-Fuse (ResNet-101)2021-11-03
Pixel Difference Convolutional Network for RGB-D Semantic Segmentation53.5%PDCNet (ResNet-101)2023-02-23
Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments✓ Link53.34%EMSANet (2x ResNet-34 NBt1D, finetuned)2022-07-10
DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation53.3%DCANet (ResNet-101)2022-10-13
Multimodal Token Fusion for Vision Transformers✓ Link53.3%TokenFusion (Ti)2022-04-19
InverseForm: A Loss Function for Structured Boundary-Aware Segmentation✓ Link53.1%InverseForm (ResNet-101)2021-04-06
Context-Aware Interaction Network for RGB-T Semantic Segmentation✓ Link52.6%CAINet (MobileNet-V2)2024-01-03
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction✓ Link52.5%CEN-PSPNet (ResNet-152)2021-12-04
Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation52.5%AMF (ResNet-50)2022-01-05
Understanding Dark Scenes by Contrasting Multi-Modal Observations✓ Link52.5%SMMCL (ResNet-101)2023-08-23
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation✓ Link52.4%SA-Gate2020-07-17
Warp-Refine Propagation: Semi-Supervised Auto-labeling via Cycle-consistency52.2%Warp-Refine2021-09-28
Deep feature selection-and-fusion for RGB-D semantic segmentation52.0%FSFNet2021-05-10
Variational Context-Deformable ConvNets for Indoor Scene Parsing51.9%VCD+ACNet (ResNet-50)2020-06-01
Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention51.9%MIPANet (ResNet50)2023-11-19
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation✓ Link51.8%DFormer-T2023-09-18
ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation✓ Link51.3%ShapeConv (ResNext-101)2021-08-24
Efficient Multi-Task Scene Analysis with RGB-D Transformers✓ Link51.26%EMSAFormer (SwinV2-T-128-Multi-Aug)2023-06-08
Depth-Adapted CNNs for RGB-D Semantic Segmentation51.24%Z-ACN (ResNet-101)2022-06-08
Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion✓ Link51.2%AsymFusion (ResNet-152)2021-08-11
Dynamic Multimodal Fusion✓ Link51.0%DynMM (ResNet-50)2022-03-31
Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation✓ Link51.0%SGNet (ResNet-101)2020-04-09
Pattern-Structure Diffusion for Multi-Task Learning51.0%PSD-ResNet502020-06-01
Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing✓ Link50.9%Malleable 2.5D (ResNet-101)2020-07-18
Scene Parsing via Integrated Classification Model and Variance-Based Regularization✓ Link50.70ICM2019-06-01
Multi-layer Feature Aggregation for Deep Scene Parsing Models50.7%SANet2020-11-04
Variational Context-Deformable ConvNets for Indoor Scene Parsing50.7%VCD+RedNet (ResNet-50)2020-06-01
HaarNet: Large-scale Linear-Morphological Hybrid Network for RGB-D Semantic Segmentation50.7%HaarNet2023-10-11
Pattern-Affinitive Propagation across Depth, Surface Normal and Semantic Segmentation50.4%PAP (ResNet-50)2019-06-08
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing✓ Link50.4%Cerberus2021-11-24
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis✓ Link50.30ESANet (R34-NBt1D)2020-11-13
Depth-Adapted CNNs for RGB-D Semantic Segmentation50.05%Z-ACN (ResNet-50)2022-06-08
Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing✓ Link49.7%Malleable 2.5D (ResNet-50)2020-07-18
MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning✓ Link49.62%MMANet2023-04-17
Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation✓ Link49.4%SGACNet (R34-NBt1D)2023-08-11
ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer✓ Link49.2%ComPtr (Swin-T)2023-07-23
Depth-Adapted CNNs for RGB-D Semantic Segmentation49.15%Z-ACN (ResNet-34)2022-06-08
Improving Multi-Modal Learning with Uni-Modal Teachers49.14%UMT2021-06-21
MTI-Net: Multi-Scale Task Interaction Networks for Multi-Task Learning✓ Link49.0MTI-Net (HRNet-48)2020-01-19
ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation✓ Link49.0%ShapeConv (ResNet-101)2021-08-24
Multimodal Knowledge Expansion✓ Link48.88%MKE2021-03-26
ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation✓ Link48.8%ShapeConv (ResNet-50)2021-08-24
mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation✓ Link48.45%mmFormer2022-06-06
ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation✓ Link48.3%ACNet2019-05-24
Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation✓ Link48.2%SGACNet (R18-NBt1D)2023-08-11
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis✓ Link48.17ESANet (R18-NBt1D )2020-11-13
RFNet: Region-Aware Fusion Network for Incomplete Multi-Modal Brain Tumor Segmentation✓ Link48.13%RFNet2021-01-01
Contrastive Multimodal Fusion with TupleInfoNCE✓ Link48.1%TupleInfoNCE2021-07-06
Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation48.1%DDSC (ResNet-101)2018-06-01
Cascaded Feature Network for Semantic Segmentation of RGB-D Images47.7%CFN2017-10-01
Learning Fully Dense Neural Networks for Image Semantic Segmentation47.4%FDNet (DenseNet264)2019-05-22
RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation✓ Link47.2%RedNet2018-06-04
Depth-Adapted CNNs for RGB-D Semantic Segmentation47.02%Z-ACN (ResNet-18)2022-06-08
Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation46.8%TRL (ResNet-101)2018-09-01
RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation✓ Link46.5%RefineNet (ResNet-101)2016-11-20
Prompt Guided Transformer for Multi-Task Dense Prediction✓ Link46.43PGT (Swin-S)2023-07-28
Exploring Relational Context for Multi-Task Dense Prediction✓ Link46.33%ATRC2021-04-28
Locality-Sensitive Deconvolution Networks With Gated Fusion for RGB-D Indoor Semantic Segmentation45.9%LS-DeconvNet2017-07-01
Variational Context-Deformable ConvNets for Indoor Scene Parsing45.3VCD+DeepLab (VGG16)2020-06-01
SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images45.0%SOSD-Net2021-01-19
Multi-Modal Attention-based Fusion Model for Semantic Segmentation of RGB-Depth Images44.8%MMAF-Net-1522019-12-25
Recurrent Scene Parsing with Perspective Understanding in the Loop✓ Link44.5%RecurrentSceneParsing2017-05-20
Light-Weight RefineNet for Real-Time Semantic Segmentation✓ Link44.4%Light-Weight-RefineNet-1522018-10-08
Depth-aware CNN for RGB-D Segmentation✓ Link43.9%Depth-aware CNN2018-03-19
Light-Weight RefineNet for Real-Time Semantic Segmentation✓ Link43.6%Light-Weight-RefineNet-1012018-10-08
Temporally Distributed Networks for Fast Video Semantic Segmentation✓ Link43.5TD2-PSP502020-04-03
NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction✓ Link43.3%NDDR-CNN2018-01-25
3D Graph Neural Networks for RGBD Semantic Segmentation✓ Link43.1%3DGNN2017-10-01
CI-Net: Contextual Information for Joint Semantic Segmentation and Depth Estimation42.6%CI-Net2021-07-29
Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations✓ Link42.0%Multi-Task Light-Weight-RefineNet2018-09-13
Light-Weight RefineNet for Real-Time Semantic Segmentation✓ Link41.7%Light-Weight-RefineNet-502018-10-08
Prompt Guided Transformer for Multi-Task Dense Prediction✓ Link41.61PGT (Swin-T)2023-07-28
Multi-Task Meta Learning: learn how to adapt to unseen tasks✓ Link41.51%MTML2022-10-13
Semantic Segmentation with Reverse Attention41.2%RAN2017-07-20
DenseMTL: Cross-task Attention Mechanism for Dense Multi-task Learning✓ Link40.84%DenseMTL2022-06-17
STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling✓ Link40.1%STD2P2016-04-08
Masked Supervised Learning for Semantic Segmentation✓ Link39.31%MaskSup2022-10-03
HeMIS: Hetero-Modal Image Segmentation✓ Link37.77%HeMIS2016-07-18
Temporally Distributed Networks for Fast Video Semantic Segmentation✓ Link37.4TD4-PSP182020-04-03
What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?✓ Link37.3%Bayesian DenseNet2017-03-15
RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training33.49%HN-network2020-02-06
Composite Learning for Robust and Effective Dense Predictions33.48%CompL2022-10-13
Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation✓ Link32.3%Dilated FCN-2s RGB2017-07-26
AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning✓ Link29.6%AdaShare2019-11-27
Toward Edge-Efficient Dense Predictions with Synergistic Multi-Task Neural Architecture Search22.1%EDNAS+JAReD2022-10-04
Cross-stitch Networks for Multi-task Learning✓ Link19.3%Cross-stitch2016-04-12
Fully Convolutional Networks for Semantic Segmentation✓ Link44FCN-32s RGB-HHA2016-05-20