GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | ✓ Link | 54.6 | | GeminiFusion (Swin-Large) | 2024-06-03 |
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer | | 54.0 | | DiffusionMMS | 2024-09-23 |
HDBFormer: Efficient RGB-D Semantic Segmentation with A Heterogeneous Dual-Branch Framework | ✓ Link | 53.9% | | HDBFormer | 2025-04-18 |
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | ✓ Link | 53.3 | | GeminiFusion (MiT-B5) | 2024-06-03 |
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | ✓ Link | 53.3 | | DFormerv2-L | 2025-04-07 |
Multimodal Token Fusion for Vision Transformers | ✓ Link | 53.0% | | TokenFusion (S) | 2022-04-19 |
Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning | ✓ Link | 52.8% | | DPLNet | 2023-12-01 |
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | ✓ Link | 52.8% | | DFormerv2-B | 2025-04-07 |
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | ✓ Link | 52.7 | | GeminiFusion (MiT-B3) | 2024-06-03 |
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | ✓ Link | 52.5% | | DFormer-L | 2023-09-18 |
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | ✓ Link | 52.4% | | CMX (B5) | 2022-03-09 |
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | ✓ Link | 52.1% | | CMX (B4) | 2022-03-09 |
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation | ✓ Link | 51.5% | | DFormerv2-S | 2025-04-07 |
Multimodal Token Fusion for Vision Transformers | ✓ Link | 51.4% | | TokenFusion (Ti) | 2022-04-19 |
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | ✓ Link | 51.2% | | DFormer-B | 2023-09-18 |
PanopticNDT: Efficient and Robust Panoptic Mapping | ✓ Link | 50.86% | | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) | 2023-09-24 |
Deep feature selection-and-fusion for RGB-D semantic segmentation | | 50.6% | | FSFNet | 2021-05-10 |
Pattern-Structure Diffusion for Multi-Task Learning | | 50.6% | | PSD-ResNet50 | 2020-06-01 |
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | ✓ Link | 50.0% | | TokenFusion (S) | 2023-09-18 |
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | ✓ Link | 49.7% | | DPLNet | 2022-03-09 |
Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation | | 49.6% | | DFormer-L | 2022-01-05 |
DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation | | 49.6% | | CMX (B5) | 2022-10-13 |
Pixel Difference Convolutional Network for RGB-D Semantic Segmentation | | 49.6% | | CMX (B4) | 2023-02-23 |
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation | ✓ Link | 49.4% | | TokenFusion (Ti) | 2020-07-17 |
AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation | ✓ Link | 49.1% | | DFormer-B | 2023-09-25 |
Efficient Multi-Task Scene Analysis with RGB-D Transformers | ✓ Link | 48.82% | | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) | 2023-06-08 |
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation | ✓ Link | 48.8% | | FSFNet | 2023-09-18 |
ShapeConv: Shape-aware Convolutional Layer for Indoor RGB-D Semantic Segmentation | ✓ Link | 48.6% | | PSD-ResNet50 | 2021-08-24 |
Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation | ✓ Link | 48.6% | | TokenFusion (S) | 2020-04-09 |
Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments | ✓ Link | 48.47% | | DPLNet | 2022-07-10 |
Attention-guided Chained Context Aggregation for Semantic Segmentation | ✓ Link | 48.3% | | DFormer-L | 2020-02-27 |
Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis | ✓ Link | 48.17 | | CMX (B5) | 2020-11-13 |
ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation | ✓ Link | 48.1% | | CMX (B4) | 2019-05-24 |
RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation | ✓ Link | 47.8% | | TokenFusion (Ti) | 2018-06-04 |
RDFNet: RGB-D Multi-Level Residual Feature Fusion for Indoor Semantic Segmentation | | 47.7% | | DFormer-B | 2017-10-01 |
Context Contrasted Feature and Gated Multi-Scale Aggregation for Scene Segmentation | ✓ Link | 47.1% | | EMSANet (2x ResNet-34 NBt1D, PanopticNDT version, finetuned) | 2018-06-01 |
Multi-Modal Attention-based Fusion Model for Semantic Segmentation of RGB-Depth Images | | 47.0% | | FSFNet | 2019-12-25 |
3D Graph Neural Networks for RGBD Semantic Segmentation | ✓ Link | 45.9% | | PSD-ResNet50 | 2017-10-01 |
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation | ✓ Link | 45.73 | | TokenFusion (S) | 2018-08-11 |
Recurrent Scene Parsing with Perspective Understanding in the Loop | ✓ Link | 45.1% | | DPLNet | 2017-05-20 |
CI-Net: Contextual Information for Joint Semantic Segmentation and Depth Estimation | | 44.3% | | DFormer-L | 2021-07-29 |
Depth-aware CNN for RGB-D Segmentation | ✓ Link | 42.0% | | TokenFusion (S) | 2018-03-19 |
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation | ✓ Link | 38.4 | | DPLNet | 2018-08-11 |
Missing Modality Robustness in Semi-Supervised Multi-Modal Semantic Segmentation | ✓ Link | | 48.17 | DFormer-L | 2023-04-21 |