OpenCodePapers

fs-mevqa-on-sme

Explanatory Visual Question AnsweringFS-MEVQA
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBLEU-4METEORROUGE-LCIDErSPICEDetectionACC#Learning Samples (N)ModelNameReleaseDate
Few-Shot Multimodal Explanation for Visual Question Answering✓ Link67.9150.5579.41510.4464.0929.0951.4516MEAgent2024-10-28
GPT-4 Technical Report✓ Link45.5135.1752.67269.6837.677.0042.3016GPT-4-1106-Vision-Preview2023-03-15
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context✓ Link41.8734.6155.90276.1440.581.4040.8816Gemini-1.5 Pro2024-03-08
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond✓ Link24.3023.4034.52201.4726.131.0540.3316Qwen-VL-Max2023-08-24
CogVLM: Visual Expert for Pretrained Language Models✓ Link14.4517.5324.28127.3717.700.8934.2316GLM-4V2023-11-06
Variational Causal Inference Network for Explanatory Visual Question Answering✓ Link9.1719.8233.344.2813.390.2817.7716VCIN2023-01-01
REX: Reasoning-aware and Grounded Explanation✓ Link0.004.3723.230.890.000.0017.7716REX2022-03-11