Paper | Code | recall@1 | recall@5 | Recall@10 | QPS | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 68.3 | 87.7 | 92.6 | BLIP-2 ViT-G (fine-tuned) | 2023-01-30 | |
VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words | ✓ Link | 68.2 | 91.8 | 96.3 | 451.4 | VisualSparta | 2021-01-01 |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 66.3 | 86.5 | 91.8 | BLIP-2 ViT-L (fine-tuned) | 2023-01-30 | |
FLAVA: A Foundational Language And Vision Alignment Model | ✓ Link | 38.38 | 67.47 | FLAVA (zero-shot) | 2021-12-08 | ||
FLAVA: A Foundational Language And Vision Alignment Model | ✓ Link | 33.29 | 62.47 | CLIP (zero-shot) | 2021-12-08 | ||
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks | ✓ Link | 98.3 | Oscar | 2020-04-13 |