Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 82.19 | BLIP-2 ViT-G OPT 6.7B (fine-tuned) | 2023-01-30 |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 81.59 | BLIP-2 ViT-G OPT 2.7B (fine-tuned) | 2023-01-30 |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 81.55 | BLIP-2 ViT-G FlanT5 XL (fine-tuned) | 2023-01-30 |
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | ✓ Link | 55.9 | LocVLM-L | 2024-04-11 |