visual-question-answering-on-docvqa-test

Visual Question Answering (VQA)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	ANLS	Accuracy	ModelName	ReleaseDate
DocVQA: A Dataset for VQA on Document Images	✓ Link	0.9436		Human	2020-07-01
Multi-label Cluster Discrimination for Visual Representation Learning	✓ Link	0.916		MLCD-Embodied-7B	2024-07-24
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts		0.908		SMoLA-PaLI-X Specialist	2023-12-01
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts		0.906		SMoLA-PaLI-X Generalist	2023-12-01
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	✓ Link	0.9024		Qwen-VL-Plus	2023-08-24
ScreenAI: A Vision-Language Model for UI and Infographics Understanding	✓ Link	0.8988		ScreenAI 5B (4.62 B params, w/OCR)	2024-02-07
PaLI-3 Vision Language Models: Smaller, Faster, Stronger	✓ Link	0.886		PaLI-3 (w/ OCR)	2023-10-13
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding	✓ Link	0.8841		ERNIE-Layout large (ensemble)	2022-10-12
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering	✓ Link	0.884		GPT-4	2023-06-01
DocFormerv2: Local Features for Document Understanding	✓ Link	0.8784		DocFormerv2-large	2023-06-02
Unifying Vision, Text, and Layout for Universal Document Processing	✓ Link	0.878		UDOP (aux)	2022-12-05
PaLI-3 Vision Language Models: Smaller, Faster, Stronger	✓ Link	0.876		PaLI-3	2023-10-13
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer	✓ Link	0.8705		TILT-Large	2021-02-18
PaLI-X: On Scaling up a Multilingual Vision and Language Model	✓ Link	0.868		PaLI-X (Single-task FT w/ OCR)	2023-05-29
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	✓ Link	0.8672		LayoutLMv2LARGE	2020-12-29
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding	✓ Link	0.8486		ERNIE-Layout large	2022-10-12
Unifying Vision, Text, and Layout for Universal Document Processing	✓ Link	0.847		UDOP	2022-12-05
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer	✓ Link	0.8392		TILT-Base	2021-02-18
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering	✓ Link	0.8336		Claude + LATIN-Prompt	2023-06-01
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering	✓ Link	0.8255		GPT-3.5 + LATIN-Prompt	2023-06-01
PaLI-X: On Scaling up a Multilingual Vision and Language Model	✓ Link	0.809		PaLI-X (Multi-task FT)	2023-05-29
DUBLIN -- Document Understanding By Language-Image Network		0.803		DUBLIN (variable resolution)	2023-05-23
PaLI-X: On Scaling up a Multilingual Vision and Language Model	✓ Link	0.80		PaLI-X (Single-task FT)	2023-05-29
DUBLIN -- Document Understanding By Language-Image Network		0.782		DUBLIN	2023-05-23
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding	✓ Link	0.7808		LayoutLMv2BASE	2020-12-29
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	✓ Link	0.766		Pix2Struct-large	2022-10-07
MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering	✓ Link	0.742		MatCha	2022-12-19
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding	✓ Link	0.721		Pix2Struct-base	2022-10-07
OCR-free Document Understanding Transformer	✓ Link	0.675		Donut	2021-11-30
DocVQA: A Dataset for VQA on Document Images	✓ Link	0.665	55.77	BERT_LARGE_SQUAD_DOCVQA_FINETUNED_Baseline	2020-07-01
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	✓ Link	0.651		Qwen-VL	2023-08-24
End-to-end Document Recognition and Understanding with Dessurt	✓ Link	0.632		Dessurt	2022-03-30
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	✓ Link	0.626		Qwen-VL-Chat	2023-08-24

OpenCodePapers

visual-question-answering-on-docvqa-test