OpenCodePapers

open-vocabulary-semantic-segmentation-on-3

Open Vocabulary Semantic Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mIoU	ModelName	ReleaseDate
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding	✓ Link	17.3	UMG-CLIP-E/14	2024-01-12
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation	✓ Link	16.8	MaskCLIP++	2024-12-16
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation	✓ Link	16.2	Mask-Adapter	2024-12-05
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation	✓ Link	16.0	CAT-Seg	2023-03-21
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding	✓ Link	15.4	UMG-CLIP-L/14	2024-01-12
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation	✓ Link	15.1	MAFT+	2024-08-01
SILC: Improving Vision Language Pretraining with Self-Distillation		15.0	SILC	2023-10-20
PosSAM: Panoptic Open-vocabulary Segment Anything	✓ Link	14.9	PosSAM	2024-03-14
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP	✓ Link	14.8	FC-CLIP	2023-08-04
Open-Vocabulary Segmentation with Semantic-Assisted Calibration	✓ Link	14.0	SCAN	2023-12-07
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation	✓ Link	13.9	SED	2023-11-27
Side Adapter Network for Open-Vocabulary Semantic Segmentation	✓ Link	13.7	SAN	2023-02-23
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing	✓ Link	13.7	EBSeg-L	2024-06-14
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction	✓ Link	12.4	CLIPSelf	2023-10-02
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation	✓ Link	12.1	MAFT-ViTL	2023-09-30
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models	✓ Link	11.1	ODISE	2023-03-08
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP	✓ Link	9	OVSeg Swin-B	2022-10-09
Open-Vocabulary Universal Image Segmentation with MaskCLIP	✓ Link	8.2	MaskCLIP	2022-08-18
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model	✓ Link	7	SimSeg	2021-12-29