Paper | Code | mIoU | ModelName | ReleaseDate |
---|---|---|---|---|
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training | ✓ Link | 17.7 | COSMOS ViT-B/16 | 2024-12-02 |
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | ✓ Link | 17.1 | GEM (MetaCLIP) | 2023-12-01 |
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | ✓ Link | 15.7 | GEM (CLIP) | 2023-12-01 |
A Closer Look at the Explainability of Contrastive Language-Image Pre-training | ✓ Link | 12.9 | CLIPSurgery | 2023-04-12 |
Extract Free Dense Labels from CLIP | ✓ Link | 10.2 | MaskCLIP | 2021-12-02 |