Paper | Code | WUPS | ModelName | ReleaseDate |
---|---|---|---|---|
PaLI-X: On Scaling up a Multilingual Vision and Language Model | ✓ Link | 38.3 | PaLI-X | 2023-05-29 |
PaLI-3 Vision Language Models: Smaller, Faster, Stronger | ✓ Link | 37.7 | PaLI-3 | 2023-10-13 |
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models | 34.7 | R2A | 2023-06-15 | |
Flamingo: a Visual Language Model for Few-Shot Learning | ✓ Link | 33.5 | Flamingo(32-shot) | 2022-04-29 |
Gemini: A Family of Highly Capable Multimodal Models | ✓ Link | 29.9 | Gemini Ultra (zero-shot) | 2023-12-19 |
Gemini: A Family of Highly Capable Multimodal Models | ✓ Link | 28.0 | Gemini Pro (zero-shot) | 2023-12-19 |
Flamingo: a Visual Language Model for Few-Shot Learning | ✓ Link | 26.7 | Flamingo(0-shot) | 2022-04-29 |
Emu: Generative Pretraining in Multimodality | ✓ Link | 23.4 | Emu(0-shot) | 2023-07-11 |