OpenCodePapers

open-vocabulary-semantic-segmentation-on-2

Open Vocabulary Semantic Segmentation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemIoUModelNameReleaseDate
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation✓ Link38.2Mask-Adapter2024-12-05
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation✓ Link38.2MaskCLIP++2024-12-16
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding✓ Link38.2UMG-CLIP-E/142024-01-12
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation✓ Link37.9CAT-Seg2023-03-21
SILC: Improving Vision Language Pretraining with Self-Distillation37.7SILC2023-10-20
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation✓ Link36.1MAFT+2024-08-01
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding✓ Link36.1UMG-CLIP-L/142024-01-12
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation35.8OVSeg + OpenDAS2024-05-30
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation✓ Link35.2SED2023-11-27
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction✓ Link34.5CLIPSelf2023-10-02
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP✓ Link34.1FC-CLIP2023-08-04
Open-Vocabulary Segmentation with Semantic-Assisted Calibration✓ Link33.5SCAN2023-12-07
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing✓ Link32.8EBSeg-L2024-06-14
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation✓ Link32.0MAFT-ViTL2023-09-30
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning✓ Link31.4PACL2022-12-09
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models✓ Link29.9ODISE2023-03-08
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP✓ Link29.6OVSeg Swin-B2022-10-09
Open-Vocabulary Universal Image Segmentation with MaskCLIP✓ Link23.7MaskCLIP2022-08-18
[]()20.7POMP
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model✓ Link20.5SimSeg2021-12-29
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias✓ Link17.0TTD (TCL)2024-03-30
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation✓ Link15.8LaVG2024-08-09
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias✓ Link12.7TTD (MaskCLIP)2024-03-30