[]() | | 95.9 | GPT-4 + knowledge base | |
[]() | | 95.2 | MVP-Tuning (ensemble) | |
Large Language Models Can Self-Improve | | 94.4 | PaLM 540B (Self Improvement, Self Consistency) | 2022-10-20 |
[]() | | 94.2 | X-Reasoner | |
Large Language Models Can Self-Improve | | 93 | PaLM 540B (Self Improvement, CoT Prompting) | 2022-10-20 |
Large Language Models Can Self-Improve | | 92 | PaLM 540B (Self Improvement, Standard-Prompting) | 2022-10-20 |
[]() | | 91.3 | DeBERTa-xxlarge 1.5B + MVP-Tuning | |
Large Language Models Can Self-Improve | | 90 | PaLM 540B (Self Consistency) | 2022-10-20 |
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering | | 90 | GrapeQA: PEGA+CANP | 2023-03-22 |
Clues Before Answers: Generation-Enhanced Multiple-Choice QA | ✓ Link | 89.8 | GenMC 11B | 2022-04-30 |
[]() | | 87.6 | AristoRoBERTa + MVP-Tuning | |
GNN is a Counter? Revisiting GNN for Question Answering | | 87.4 | AristoRoBERTa + Graph Soft Counter | 2021-10-07 |
UnifiedQA: Crossing Format Boundaries With a Single QA System | ✓ Link | 87.2 | UnifiedQA 11B | 2020-05-02 |
Mixture-of-Subspaces in Low-Rank Adaptation | ✓ Link | 86.8 | LLaMA-3 8B+MoSLoRA | 2024-06-16 |
Large Language Models Can Self-Improve | | 86.4 | PaLM 540B (CoT Prompting) | 2022-10-20 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 84.8 | LLaMA-3 8B + MixLoRA | 2024-04-22 |
Large Language Models Can Self-Improve | | 84.4 | PaLM 540B (Standard-Prompting) | 2022-10-20 |
Fusing Context Into Knowledge Graph for Commonsense Question Answering | ✓ Link | 83.2 | TTTTT 3B | 2020-12-09 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 83 | LLaMA-2 13B + MixLoRA | 2024-04-22 |
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering | ✓ Link | 82.8 | AristoRoBERTa + QA-GNN | 2021-04-13 |
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering | ✓ Link | 82.8 | QA-GNN | 2021-04-13 |
Fusing Context Into Knowledge Graph for Commonsense Question Answering | ✓ Link | 82.4 | DEKCOR | 2020-12-09 |
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering | | 82 | GrapeQA: PEGA | 2023-03-22 |
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | ✓ Link | 81.6 | LLaMA-2 7B + MixLoRA | 2024-04-22 |
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering | ✓ Link | 77.8 | AristoRoBERTa | 2021-04-13 |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | ✓ Link | 76.9 | BiLSTM max-out question-match (science fact + common knowledge fact) | 2018-09-08 |
Careful Selection of Knowledge to solve Open Book Question Answering | | 72 | Careful Selection | 2019-07-24 |
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering | | 66.2 | GrapeQA: CANP | 2023-03-22 |
Language Models are Few-Shot Learners | ✓ Link | 65.4 | GPT-3 175B (few-shot, k=32) | 2020-05-28 |
PaLM 2 Technical Report | ✓ Link | 58.5 | PaLM 2-L (1-shot) | 2023-05-17 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 58.0 | OPT 66B (one-shot) | 2023-03-30 |
PaLM 2 Technical Report | ✓ Link | 57.4 | PaLM 2-S (1-shot) | 2023-05-17 |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | ✓ Link | 56.3 | BiLSTM max-out question-match (WordNet + science fact) | 2018-09-08 |
PaLM 2 Technical Report | ✓ Link | 56.2 | PaLM 2-M (1-shot) | 2023-05-17 |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | ✓ Link | 55.8 | BiLSTM max-out question-match (with a science fact) | 2018-09-08 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 51.6 | Bloomberg GPT 50B (1-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 47.2 | BLOOM 176B (2-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 44.2 | GPT-NeoX 50B (2-shot) | 2023-03-30 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 39.8 | LaMini-GPT 1.5B | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 36 | LaMini-T5 738M | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 34 | LaMini-F-T5 783M | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 32.8 | T5-Large 738M | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 32 | GPT-2-XL 1.5B | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 31.2 | FLAN-T5-Large 783M | 2023-04-27 |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | ✓ Link | 25 | Random chance baseline | 2018-09-08 |