question-answering-on-openbookqa

Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
[]()		95.9	GPT-4 + knowledge base
[]()		95.2	MVP-Tuning (ensemble)
Large Language Models Can Self-Improve		94.4	PaLM 540B (Self Improvement, Self Consistency)	2022-10-20
[]()		94.2	X-Reasoner
Large Language Models Can Self-Improve		93	PaLM 540B (Self Improvement, CoT Prompting)	2022-10-20
Large Language Models Can Self-Improve		92	PaLM 540B (Self Improvement, Standard-Prompting)	2022-10-20
[]()		91.3	DeBERTa-xxlarge 1.5B + MVP-Tuning
Large Language Models Can Self-Improve		90	PaLM 540B (Self Consistency)	2022-10-20
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering		90	GrapeQA: PEGA+CANP	2023-03-22
Clues Before Answers: Generation-Enhanced Multiple-Choice QA	✓ Link	89.8	GenMC 11B	2022-04-30
[]()		87.6	AristoRoBERTa + MVP-Tuning
GNN is a Counter? Revisiting GNN for Question Answering		87.4	AristoRoBERTa + Graph Soft Counter	2021-10-07
UnifiedQA: Crossing Format Boundaries With a Single QA System	✓ Link	87.2	UnifiedQA 11B	2020-05-02
Mixture-of-Subspaces in Low-Rank Adaptation	✓ Link	86.8	LLaMA-3 8B+MoSLoRA	2024-06-16
Large Language Models Can Self-Improve		86.4	PaLM 540B (CoT Prompting)	2022-10-20
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	84.8	LLaMA-3 8B + MixLoRA	2024-04-22
Large Language Models Can Self-Improve		84.4	PaLM 540B (Standard-Prompting)	2022-10-20
Fusing Context Into Knowledge Graph for Commonsense Question Answering	✓ Link	83.2	TTTTT 3B	2020-12-09
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	83	LLaMA-2 13B + MixLoRA	2024-04-22
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering	✓ Link	82.8	AristoRoBERTa + QA-GNN	2021-04-13
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering	✓ Link	82.8	QA-GNN	2021-04-13
Fusing Context Into Knowledge Graph for Commonsense Question Answering	✓ Link	82.4	DEKCOR	2020-12-09
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering		82	GrapeQA: PEGA	2023-03-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	81.6	LLaMA-2 7B + MixLoRA	2024-04-22
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering	✓ Link	77.8	AristoRoBERTa	2021-04-13
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering	✓ Link	76.9	BiLSTM max-out question-match (science fact + common knowledge fact)	2018-09-08
Careful Selection of Knowledge to solve Open Book Question Answering		72	Careful Selection	2019-07-24
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering		66.2	GrapeQA: CANP	2023-03-22
Language Models are Few-Shot Learners	✓ Link	65.4	GPT-3 175B (few-shot, k=32)	2020-05-28
PaLM 2 Technical Report	✓ Link	58.5	PaLM 2-L (1-shot)	2023-05-17
BloombergGPT: A Large Language Model for Finance	✓ Link	58.0	OPT 66B (one-shot)	2023-03-30
PaLM 2 Technical Report	✓ Link	57.4	PaLM 2-S (1-shot)	2023-05-17
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering	✓ Link	56.3	BiLSTM max-out question-match (WordNet + science fact)	2018-09-08
PaLM 2 Technical Report	✓ Link	56.2	PaLM 2-M (1-shot)	2023-05-17
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering	✓ Link	55.8	BiLSTM max-out question-match (with a science fact)	2018-09-08
BloombergGPT: A Large Language Model for Finance	✓ Link	51.6	Bloomberg GPT 50B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link	47.2	BLOOM 176B (2-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link	44.2	GPT-NeoX 50B (2-shot)	2023-03-30
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	39.8	LaMini-GPT 1.5B	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	36	LaMini-T5 738M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	34	LaMini-F-T5 783M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	32.8	T5-Large 738M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	32	GPT-2-XL 1.5B	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	31.2	FLAN-T5-Large 783M	2023-04-27
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering	✓ Link	25	Random chance baseline	2018-09-08

OpenCodePapers

question-answering-on-openbookqa