Paper | Code | Generation | ModelName | ReleaseDate |
---|---|---|---|---|
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | ✓ Link | 70.88 | SoFar | 2025-02-18 |
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | ✓ Link | 49.11 | Qwen-VL-Max | 2023-08-24 |
GPT-4 Technical Report | ✓ Link | 36.07 | GPT-4V | 2023-03-15 |
Visual Instruction Tuning | ✓ Link | 35.19 | LLaVA-1.6 | 2023-04-17 |
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | ✓ Link | 23.54 | MiniGPT4 | 2023-04-20 |