OpenCodePapers

chart-question-answering-on-chartqa

Chart Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCode1:1 AccuracyModelNameReleaseDate
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs81.3ChartPaLI-5B + PaLM 2-S2024-03-19
Gemini: A Family of Highly Capable Multimodal Models✓ Link80.8Gemini Ultra2023-12-19
DePlot: One-shot visual language reasoning by plot-to-table translation✓ Link79.3DePlot+FlanPaLM+Codex (PoT Self-Consistency)2022-12-20
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs77.3ChartPaLI-5B2024-03-19
DePlot: One-shot visual language reasoning by plot-to-table translation✓ Link76.7DePlot+Codex (PoT Self-Consistency)2022-12-20
ScreenAI: A Vision-Language Model for UI and Infographics Understanding✓ Link76.7ScreenAI 5B (4.62 B params, w/ OCR)2024-02-07
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts74.6SMoLA-PaLI-X Specialist Model2023-12-01
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts73.8SMoLA-PaLI-X Generalist Model2023-12-01
Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA72.64MatCha4096 + LaMenDa2024-01-01
PaLI-X: On Scaling up a Multilingual Vision and Language Model✓ Link72.3PaLI-X (Single-task FT w/ OCR)2023-05-29
PaLI-X: On Scaling up a Multilingual Vision and Language Model✓ Link70.9PaLI-X (Single-task FT)2023-05-29
PaLI-X: On Scaling up a Multilingual Vision and Language Model✓ Link70.6PaLI-X (Multi-task FT)2023-05-29
DePlot: One-shot visual language reasoning by plot-to-table translation✓ Link70.5DePlot+FlanPaLM (Self-Consistency)2022-12-20
PaLI-3 Vision Language Models: Smaller, Faster, Stronger✓ Link70PaLI-32023-10-13
PaLI-3 Vision Language Models: Smaller, Faster, Stronger✓ Link69.5PaLI-3 (w/ OCR)2023-10-13
DePlot: One-shot visual language reasoning by plot-to-table translation✓ Link67.3DePlot+FlanPaLM (CoT)2022-12-20
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond✓ Link66.3Qwen-VL-Chat2023-08-24
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning✓ Link66.24UniChart2023-05-24
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond✓ Link65.7Qwen-VL2023-08-24
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding✓ Link65.3StructChart+GPT3.5 (STR ChartQA+SimChart9K)2023-09-20
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering✓ Link64.2MatCha2022-12-19
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding✓ Link60.7StructChart+GPT3.5 (STR)2023-09-20
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding✓ Link58.6Pix2Struct-large2022-10-07
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding✓ Link56.0Pix2Struct-base2022-10-07
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning✓ Link45.5VisionTapas-OCR2022-03-19
DePlot: One-shot visual language reasoning by plot-to-table translation✓ Link42.3DePlot+GPT3 (Self-Consistency)2022-12-20
DePlot: One-shot visual language reasoning by plot-to-table translation✓ Link36.9DePlot+GPT3 (CoT)2022-12-20