MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | ✓ Link | 43.4 | | | | MMRet-MLLM | 2024-12-19 |
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | ✓ Link | 40.2 | | | | MMRet-Large (CLIP L/14) | 2024-12-19 |
SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval | | 37.88 | | | | SCOT (WACV 2025) | 2025-01-12 |
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval | ✓ Link | 37.23 | | | | SEIZE (CLIP G/14 & GPT-4o) | 2024-10-28 |
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | ✓ Link | 35.4 | | | | MagicLens (CoCa L) | 2024-03-28 |
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | ✓ Link | 35.0 | | | | MMRet-Base (CLIP B/16) | 2024-12-19 |
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy | | 34.26 | | | | IP-CIR + LDRE (CLIP G/14) | 2024-11-24 |
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval | ✓ Link | 33.77 | | | | SEIZE (CLIP G/14) | 2024-10-28 |
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval | ✓ Link | 32.24 | | | | LDRE (CLIP G/14) | 2024-07-11 |
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | ✓ Link | 32.0 | | | | MagicLens (CoCa B) | 2024-03-28 |
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | ✓ Link | 31.14 | | | | OSrCIR (CLIP G/14) | 2024-12-15 |
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | ✓ Link | 30.8 | | | | MagicLens (CLIP L) | 2024-03-28 |
CoVR-2: Automatic Data Construction for Composed Video Retrieval | ✓ Link | 29.55 | | | | CoVR-BLIP-2 | 2023-08-28 |
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning | ✓ Link | 29.23 | 28.36 | 31.88 | 30.81 | ImageScope (CLIP-ViT-L/14) | 2025-03-13 |
Vision-by-Language for Training-Free Compositional Image Retrieval | ✓ Link | 27.59 | | | | CIReVL (CLIP G/14) | 2023-10-13 |
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy | | 27.41 | | | | IP-CIR + LDRE (CLIP L/14) | 2024-11-24 |
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval | ✓ Link | 25.82 | | | | SEIZE (CLIP L/14) | 2024-10-28 |
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | ✓ Link | 25.33 | | | | OSrCIR (CLIP L/14) | 2024-12-15 |
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval | ✓ Link | 24.03 | | | | LDRE (CLIP L/14) | 2024-07-11 |
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | ✓ Link | 23.8 | | | | MagicLens (CLIP B) | 2024-03-28 |
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval | ✓ Link | 22.29 | | | | RTD + LinCIR (CLIP G/14) | 2024-06-13 |
Language-only Efficient Training of Zero-shot Composed Image Retrieval | ✓ Link | 21.01 | | | | LinCIR (CLIP G/14) | 2023-12-04 |
CoLLM: A Large Language Model for Composed Image Retrieval | ✓ Link | 20.8 | 20.3 | 23.4 | | CoLLM (Pretrained - CLIP-L/14) | 2025-03-25 |
CoLLM: A Large Language Model for Composed Image Retrieval | ✓ Link | 20.4 | 19.7 | 23.1 | | CoLLM (Pretrained - BLIP-L/16) | 2025-03-25 |
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval | ✓ Link | 19.17 | | | | OSrCIR (CLIP B/32) | 2024-12-15 |
Vision-by-Language for Training-Free Compositional Image Retrieval | ✓ Link | 19.01 | | | | CIReVL (CLIP L/14) | 2023-10-13 |
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval | ✓ Link | 18.32 | | | | LDRE (CLIP B/32) | 2024-07-11 |
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval | ✓ Link | 18.11 | | | | RTD + LinCIR (CLIP L/14) | 2024-06-13 |
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion | ✓ Link | 17.71 | | | | CompoDiff (CLIP G/14) | 2023-03-21 |
Vision-by-Language for Training-Free Compositional Image Retrieval | ✓ Link | 15.42 | | | | CIReVL (CLIP B/32) | 2023-10-13 |
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval | ✓ Link | 14.62 | | | | Context-I2W | 2023-09-28 |
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | ✓ Link | 13.61 | | | | iSEARLE-XL (CLIP L/14) | 2024-05-05 |
Language-only Efficient Training of Zero-shot Composed Image Retrieval | ✓ Link | 13.58 | | | | LinCIR (CLIP L/14) | 2023-12-04 |
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion | ✓ Link | 13.51 | | | | CompoDiff (CLIP L/14) | 2023-03-21 |
Zero-Shot Composed Image Retrieval with Textual Inversion | ✓ Link | 12.73 | | | | SEARLE-XL (CLIP L/14) | 2023-03-27 |
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | ✓ Link | 12.67 | | | | iSEARLE-XL-OTI (CLIP L/14) | 2024-05-05 |
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval | ✓ Link | 11.63 | | | | MTCIR (CLIP L/14) | 2023-11-13 |
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | ✓ Link | 11.24 | | | | iSEARLE (CLIP B/32) | 2024-05-05 |
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval | ✓ Link | 10.94 | | | | iSEARLE-OTI (CLIP B/32) | 2024-05-05 |
Zero-Shot Composed Image Retrieval with Textual Inversion | ✓ Link | 9.94 | | | | SEARLE (CLIP B/32) | 2023-03-27 |
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval | ✓ Link | 9.51 | | | | Pic2Word | 2023-02-06 |
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval | ✓ Link | 8.03 | | | | MTCIR (BLIP B/16) | 2023-11-13 |
"This is my unicorn, Fluffy": Personalizing frozen vision-language representations | ✓ Link | 5.32 | | | | PALAVRA | 2022-04-04 |