Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers | ✓ Link | 30.65 | RA-VQAv2 w/ PreFLMR | 2024-02-13 |
PaLI-X: On Scaling up a Multilingual Vision and Language Model | ✓ Link | 24 | PaLI-X | 2023-05-29 |
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? | ✓ Link | 20.9 | CLIP + FiD | 2023-02-23 |
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? | ✓ Link | 20.4 | CLIP + PaLM (540B) | 2023-02-23 |
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? | ✓ Link | 19.7 | PaLI | 2023-02-23 |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | ✓ Link | 14.6 | BLIP2 | 2023-01-30 |
[]() | 14.5 | InstructBLIP |