open-vocabulary-semantic-segmentation-on-1

Open Vocabulary Semantic Segmentation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mIoU	ModelName	ReleaseDate
HyperSeg: Towards Universal Visual Segmentation with Large Language Model	✓ Link	64.6	HyperSeg	2024-11-26
SILC: Improving Vision Language Pretraining with Self-Distillation		63.5	SILC	2023-10-20
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation	✓ Link	63.3	CAT-Seg	2023-03-21
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation	✓ Link	62.5	MaskCLIP++	2024-12-16
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction	✓ Link	62.3	CLIPSelf	2023-10-02
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding	✓ Link	61.0	UMG-CLIP-L/14	2024-01-12
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation	✓ Link	60.6	SED	2023-11-27
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation	✓ Link	60.4	Mask-Adapter	2024-12-05
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing	✓ Link	60.2	EBSeg-L	2024-06-14
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation	✓ Link	59.4	MAFT+	2024-08-01
Open-Vocabulary Segmentation with Semantic-Assisted Calibration	✓ Link	59.3	SCAN	2023-12-07
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation	✓ Link	58.5	MAFT-ViTL	2023-09-30
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP	✓ Link	58.4	FC-CLIP	2023-08-04
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models	✓ Link	57.3	ODISE	2023-03-08
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP	✓ Link	55.7	OVSeg Swin-B	2022-10-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning	✓ Link	50.1	PACL	2022-12-09
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model	✓ Link	47.7	SimSeg	2021-12-29
Open-Vocabulary Universal Image Segmentation with MaskCLIP	✓ Link	45.9	MaskCLIP	2022-08-18
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification	✓ Link	37.6	TaAlign(trained with image-text pairs)	2023-12-21
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias	✓ Link	37.4	TTD (TCL)	2024-03-30
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation	✓ Link	34.7	LaVG	2024-08-09
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs	✓ Link	33.9	TCL	2022-12-01
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias	✓ Link	31.0	TTD (MaskCLIP)	2024-03-30
A Closer Look at the Explainability of Contrastive Language-Image Pre-training	✓ Link	29.3	CLIP Surgery (original CLIP without any fine-tuning)	2023-04-12

OpenCodePapers

open-vocabulary-semantic-segmentation-on-1