open-vocabulary-semantic-segmentation-on-2

Open Vocabulary Semantic Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mIoU	ModelName	ReleaseDate
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding	✓ Link	38.2	UMG-CLIP-E/14	2024-01-12
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation	✓ Link	38.2	Mask-Adapter	2024-12-05
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation	✓ Link	38.2	MaskCLIP++	2024-12-16
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation	✓ Link	37.9	CAT-Seg	2023-03-21
SILC: Improving Vision Language Pretraining with Self-Distillation		37.7	SILC	2023-10-20
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding	✓ Link	36.1	UMG-CLIP-L/14	2024-01-12
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation	✓ Link	36.1	MAFT+	2024-08-01
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation		35.8	OVSeg + OpenDAS	2024-05-30
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation	✓ Link	35.2	SED	2023-11-27
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction	✓ Link	34.5	CLIPSelf	2023-10-02
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP	✓ Link	34.1	FC-CLIP	2023-08-04
Open-Vocabulary Segmentation with Semantic-Assisted Calibration	✓ Link	33.5	SCAN	2023-12-07
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing	✓ Link	32.8	EBSeg-L	2024-06-14
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation	✓ Link	32.0	MAFT-ViTL	2023-09-30
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning	✓ Link	31.4	PACL	2022-12-09
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models	✓ Link	29.9	ODISE	2023-03-08
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP	✓ Link	29.6	OVSeg Swin-B	2022-10-09
Open-Vocabulary Universal Image Segmentation with MaskCLIP	✓ Link	23.7	MaskCLIP	2022-08-18
[]()		20.7	POMP
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model	✓ Link	20.5	SimSeg	2021-12-29
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias	✓ Link	17.0	TTD (TCL)	2024-03-30
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation	✓ Link	15.8	LaVG	2024-08-09
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias	✓ Link	12.7	TTD (MaskCLIP)	2024-03-30

OpenCodePapers

open-vocabulary-semantic-segmentation-on-2