OpenCodePapers

open-vocabulary-semantic-segmentation-on-1

Open Vocabulary Semantic Segmentation
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemIoUModelNameReleaseDate
HyperSeg: Towards Universal Visual Segmentation with Large Language Model✓ Link64.6HyperSeg2024-11-26
SILC: Improving Vision Language Pretraining with Self-Distillation63.5SILC2023-10-20
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation✓ Link63.3CAT-Seg2023-03-21
MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation✓ Link62.5MaskCLIP++2024-12-16
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction✓ Link62.3CLIPSelf2023-10-02
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding✓ Link61.0UMG-CLIP-L/142024-01-12
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation✓ Link60.6SED2023-11-27
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation✓ Link60.4Mask-Adapter2024-12-05
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing✓ Link60.2EBSeg-L2024-06-14
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation✓ Link59.4MAFT+2024-08-01
Open-Vocabulary Segmentation with Semantic-Assisted Calibration✓ Link59.3SCAN2023-12-07
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation✓ Link58.5MAFT-ViTL2023-09-30
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP✓ Link58.4FC-CLIP2023-08-04
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models✓ Link57.3ODISE2023-03-08
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP✓ Link55.7OVSeg Swin-B2022-10-09
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning✓ Link50.1PACL2022-12-09
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model✓ Link47.7SimSeg2021-12-29
Open-Vocabulary Universal Image Segmentation with MaskCLIP✓ Link45.9MaskCLIP2022-08-18
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification✓ Link37.6TaAlign(trained with image-text pairs)2023-12-21
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias✓ Link37.4TTD (TCL)2024-03-30
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation✓ Link34.7LaVG2024-08-09
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs✓ Link33.9TCL2022-12-01
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias✓ Link31.0TTD (MaskCLIP)2024-03-30
A Closer Look at the Explainability of Contrastive Language-Image Pre-training✓ Link29.3CLIP Surgery (original CLIP without any fine-tuning)2023-04-12