OpenCodePapers

question-answering-on-openbookqa

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
[]()95.9GPT-4 + knowledge base
[]()95.2MVP-Tuning (ensemble)
Large Language Models Can Self-Improve94.4PaLM 540B (Self Improvement, Self Consistency)2022-10-20
[]()94.2X-Reasoner
Large Language Models Can Self-Improve93PaLM 540B (Self Improvement, CoT Prompting)2022-10-20
Large Language Models Can Self-Improve92PaLM 540B (Self Improvement, Standard-Prompting)2022-10-20
[]()91.3DeBERTa-xxlarge 1.5B + MVP-Tuning
Large Language Models Can Self-Improve90PaLM 540B (Self Consistency)2022-10-20
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering90GrapeQA: PEGA+CANP2023-03-22
Clues Before Answers: Generation-Enhanced Multiple-Choice QA✓ Link89.8GenMC 11B2022-04-30
[]()87.6AristoRoBERTa + MVP-Tuning
GNN is a Counter? Revisiting GNN for Question Answering87.4AristoRoBERTa + Graph Soft Counter2021-10-07
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link87.2UnifiedQA 11B2020-05-02
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link86.8LLaMA-3 8B+MoSLoRA2024-06-16
Large Language Models Can Self-Improve86.4PaLM 540B (CoT Prompting)2022-10-20
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link84.8LLaMA-3 8B + MixLoRA2024-04-22
Large Language Models Can Self-Improve84.4PaLM 540B (Standard-Prompting)2022-10-20
Fusing Context Into Knowledge Graph for Commonsense Question Answering✓ Link83.2TTTTT 3B2020-12-09
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link83LLaMA-2 13B + MixLoRA2024-04-22
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering✓ Link82.8AristoRoBERTa + QA-GNN2021-04-13
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering✓ Link82.8QA-GNN2021-04-13
Fusing Context Into Knowledge Graph for Commonsense Question Answering✓ Link82.4DEKCOR2020-12-09
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering82GrapeQA: PEGA2023-03-22
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link81.6LLaMA-2 7B + MixLoRA2024-04-22
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering✓ Link77.8AristoRoBERTa2021-04-13
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering✓ Link76.9BiLSTM max-out question-match (science fact + common knowledge fact)2018-09-08
Careful Selection of Knowledge to solve Open Book Question Answering72Careful Selection2019-07-24
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering66.2GrapeQA: CANP2023-03-22
Language Models are Few-Shot Learners✓ Link65.4GPT-3 175B (few-shot, k=32)2020-05-28
PaLM 2 Technical Report✓ Link58.5PaLM 2-L (1-shot)2023-05-17
BloombergGPT: A Large Language Model for Finance✓ Link58.0OPT 66B (one-shot)2023-03-30
PaLM 2 Technical Report✓ Link57.4PaLM 2-S (1-shot)2023-05-17
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering✓ Link56.3BiLSTM max-out question-match (WordNet + science fact)2018-09-08
PaLM 2 Technical Report✓ Link56.2PaLM 2-M (1-shot)2023-05-17
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering✓ Link55.8BiLSTM max-out question-match (with a science fact)2018-09-08
BloombergGPT: A Large Language Model for Finance✓ Link51.6Bloomberg GPT 50B (1-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link47.2BLOOM 176B (2-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link44.2GPT-NeoX 50B (2-shot)2023-03-30
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link39.8LaMini-GPT 1.5B2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link36LaMini-T5 738M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link34LaMini-F-T5 783M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link32.8T5-Large 738M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link32GPT-2-XL 1.5B2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link31.2FLAN-T5-Large 783M2023-04-27
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering✓ Link25Random chance baseline2018-09-08