OpenCodePapers

visual-question-answering-on-benchlmm

Visual Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeGPT-3.5 scoreModelNameReleaseDate
GPT-4 Technical Report✓ Link58.37GPT-4V2023-03-15
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models✓ Link57.43Sphinx-V2-1K2023-11-13
Improved Baselines with Visual Instruction Tuning✓ Link55.53LLaVA-1.5-13B2023-10-05
Visual Instruction Tuning✓ Link46.83LLaVA-1.5-7B2023-04-17
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning✓ Link45.03InstructBLIP-13B2023-05-11
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning✓ Link44.63InstructBLIP-7B2023-05-11
Visual Instruction Tuning✓ Link43.50LLaVA-1-13B2023-04-17
Otter: A Multi-Modal Model with In-Context Instruction Tuning✓ Link39.13Otter-7B2023-05-05
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models✓ Link34.93MiniGPT4-13B2023-04-20
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning✓ Link30.1MiniGPTv2-7B2023-10-14