CAFuser: Condition-Aware Multimodal Fusion for Robust Semantic Perception of Driving Scenes | ✓ Link | 68.6 | CAFuser-CAA | 2024-10-14 |
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | ✓ Link | 68.18 | StitchFusion(RGB-D-E-LiDAR) | 2024-08-02 |
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer | ✓ Link | 66.9 | GeminiFusion | 2024-06-03 |
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | ✓ Link | 66.65 | StitchFusion (RGB-D-LiDAR) | 2024-08-02 |
Delivering Arbitrary-Modal Semantic Segmentation | ✓ Link | 66.30 | CMNeXt (RGB-D-E-LiDAR) | 2023-03-02 |
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | ✓ Link | 66.03 | StitchFusion (RGB-D-Event) | 2024-08-02 |
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | ✓ Link | 65.75 | StitchFusion (RGB-Depth) | 2024-08-02 |
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation | ✓ Link | 65.38 | MemorySAM-B+(R-D-E-L) | 2025-03-09 |
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation | ✓ Link | 63.48 | MemorySAM-B+(R-D) | 2025-03-09 |
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | ✓ Link | 62.67 | CMX (RGB-Depth) | 2022-03-09 |
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation | ✓ Link | 62.42 | MemorySAM-B+(R-D-E) | 2025-03-09 |
Multimodal Token Fusion for Vision Transformers | ✓ Link | 60.25 | TokenFusion (RGB-Depth) | 2022-04-19 |
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | ✓ Link | 58.03 | StitchFusion (RGB-LiDAR) | 2024-08-02 |
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation | ✓ Link | 57.44 | StitchFusion (RGB-Event) | 2024-08-02 |
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | ✓ Link | 56.52 | CMX (RGB-Event) | 2022-03-09 |
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers | ✓ Link | 56.37 | CMX (RGB-LiDAR) | 2022-03-09 |
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation | ✓ Link | 53.22 | MemorySAM-B+(RGB) | 2025-03-09 |
Multimodal Token Fusion for Vision Transformers | ✓ Link | 53.01 | TokenFusion (RGB-LiDAR) | 2022-04-19 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 52.97 | HRFuser (RGB-D-E-Li) | 2022-06-30 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 52.72 | HRFuser (RGB-D-LiDAR) | 2022-06-30 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 51.88 | HRFuser (RGB-Depth) | 2022-06-30 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 51.83 | HRFuser (RGB-D-Event) | 2022-06-30 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 47.95 | HRFuser (RGB) | 2022-06-30 |
Multimodal Token Fusion for Vision Transformers | ✓ Link | 45.63 | TokenFusion (RGB-Event) | 2022-04-19 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 43.13 | HRFuser (RGB-LiDAR) | 2022-06-30 |
HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection | ✓ Link | 42.22 | HRFuser (RGB-Event) | 2022-06-30 |