OpenCodePapers

image-retrieval-on-crepe-vision-language

Image Retrieval
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeRecall@1 (HN-Atom + HN-Comp, SC)Recall@1 (HN-Atom + HN-Comp, UC)Recall@1 (HN-Atom, UC)Recall@1 (HN-Comp, UC)ModelNameReleaseDate
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link39.4433.8147.8660.78ViT-L-14 (LAION400M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link37.3232.2646.5360.19ViT-B-16+240 (LAION400M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link37.0130.8144.9359.00ViT-B-16 (LAION400M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link34.2828.0042.7554.80ViT-B-32 (LAION400M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link23.3820.0839.8539.83RN50 (YFCC15M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link23.2619.9634.8845.27RN50 (CC12M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link22.7420.5039.5039.56RN101 (YFCC15M)2022-12-13
CREPE: Can Vision-Language Foundation Models Reason Compositionally?✓ Link9.099.0920.0014.29Random2022-12-13
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality44.592.1Swin-T (MosaiCLIP, CC-12M)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality44.492.6RN-50 (MosaiCLIP, CC-12M)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality41.548.8MosaiCLIP (YFCC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality41.482.0RN-50 (NegCLIP, CC-12M)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality40.972.4MosaiCLIP (CC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality39.680.3Swin-T (NegCLIP, CC-12M)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality39.539.8CLIP (YFCC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality39.038.8NegCLIP (YFCC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality38.336.4CLIP-FT (YFCC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality37.553.1NegCLIP (CC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality37.344.1Swin-T (CLIP, CC-12M)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality36.742.9RN-50 (CLIP, CC-12M)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality35.645.8CLIP-FT (CC-FT)2023-05-23
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality35.045.1CLIP (CC-FT)2023-05-23