Paper | Code | HIoU | mIoU | ModelName | ReleaseDate |
---|---|---|---|---|---|
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition | ✓ Link | 39.1 | POMP | 2023-04-10 | |
A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | ✓ Link | 37.8 | ZSSeg | 2021-12-29 | |
Decoupling Zero-Shot Semantic Segmentation | ✓ Link | 34.8 | ZegFormer | 2021-12-15 | |
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias | ✓ Link | 23.7 | TTD (TCL) | 2024-03-30 | |
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | ✓ Link | 23.2 | LaVG | 2024-08-09 | |
A Closer Look at the Explainability of Contrastive Language-Image Pre-training | ✓ Link | 21.9 | CLIP Surgery (original CLIP without any fine-tuning) | 2023-04-12 | |
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias | ✓ Link | 19.4 | TTD (MaskCLIP) | 2024-03-30 |