OpenCodePapers

zero-shot-composed-image-retrieval-zs-cir-on

Composed Image Retrieval (CoIR)Zero-Shot Composed Image Retrieval (ZS-CIR)
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemAP@10MAP@5mAP@50mAP@25ModelNameReleaseDate
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval✓ Link43.4MMRet-MLLM2024-12-19
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval✓ Link40.2MMRet-Large (CLIP L/14)2024-12-19
SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval37.88SCOT (WACV 2025)2025-01-12
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval✓ Link37.23SEIZE (CLIP G/14 & GPT-4o)2024-10-28
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions✓ Link35.4MagicLens (CoCa L)2024-03-28
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval✓ Link35.0MMRet-Base (CLIP B/16)2024-12-19
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy34.26IP-CIR + LDRE (CLIP G/14)2024-11-24
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval✓ Link33.77SEIZE (CLIP G/14)2024-10-28
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval✓ Link32.24LDRE (CLIP G/14)2024-07-11
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions✓ Link32.0MagicLens (CoCa B)2024-03-28
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval✓ Link31.14OSrCIR (CLIP G/14)2024-12-15
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions✓ Link30.8MagicLens (CLIP L)2024-03-28
CoVR-2: Automatic Data Construction for Composed Video Retrieval✓ Link29.55CoVR-BLIP-22023-08-28
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning✓ Link29.2328.3631.8830.81ImageScope (CLIP-ViT-L/14)2025-03-13
Vision-by-Language for Training-Free Compositional Image Retrieval✓ Link27.59CIReVL (CLIP G/14)2023-10-13
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy27.41IP-CIR + LDRE (CLIP L/14)2024-11-24
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval✓ Link25.82SEIZE (CLIP L/14)2024-10-28
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval✓ Link25.33OSrCIR (CLIP L/14)2024-12-15
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval✓ Link24.03LDRE (CLIP L/14)2024-07-11
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions✓ Link23.8MagicLens (CLIP B)2024-03-28
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval✓ Link22.29RTD + LinCIR (CLIP G/14)2024-06-13
Language-only Efficient Training of Zero-shot Composed Image Retrieval✓ Link21.01LinCIR (CLIP G/14)2023-12-04
CoLLM: A Large Language Model for Composed Image Retrieval✓ Link20.820.323.4CoLLM (Pretrained - CLIP-L/14)2025-03-25
CoLLM: A Large Language Model for Composed Image Retrieval✓ Link20.419.723.1CoLLM (Pretrained - BLIP-L/16)2025-03-25
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval✓ Link19.17OSrCIR (CLIP B/32)2024-12-15
Vision-by-Language for Training-Free Compositional Image Retrieval✓ Link19.01CIReVL (CLIP L/14)2023-10-13
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval✓ Link18.32LDRE (CLIP B/32)2024-07-11
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval✓ Link18.11RTD + LinCIR (CLIP L/14)2024-06-13
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion✓ Link17.71CompoDiff (CLIP G/14)2023-03-21
Vision-by-Language for Training-Free Compositional Image Retrieval✓ Link15.42CIReVL (CLIP B/32)2023-10-13
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval✓ Link14.62Context-I2W2023-09-28
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval✓ Link13.61iSEARLE-XL (CLIP L/14)2024-05-05
Language-only Efficient Training of Zero-shot Composed Image Retrieval✓ Link13.58LinCIR (CLIP L/14)2023-12-04
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion✓ Link13.51CompoDiff (CLIP L/14)2023-03-21
Zero-Shot Composed Image Retrieval with Textual Inversion✓ Link12.73SEARLE-XL (CLIP L/14)2023-03-27
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval✓ Link12.67iSEARLE-XL-OTI (CLIP L/14)2024-05-05
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval✓ Link11.63MTCIR (CLIP L/14)2023-11-13
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval✓ Link11.24iSEARLE (CLIP B/32)2024-05-05
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval✓ Link10.94iSEARLE-OTI (CLIP B/32)2024-05-05
Zero-Shot Composed Image Retrieval with Textual Inversion✓ Link9.94SEARLE (CLIP B/32)2023-03-27
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval✓ Link9.51Pic2Word2023-02-06
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval✓ Link8.03MTCIR (BLIP B/16)2023-11-13
"This is my unicorn, Fluffy": Personalizing frozen vision-language representations✓ Link5.32PALAVRA2022-04-04