OpenCodePapers

question-answering-on-boolq

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles✓ Link99.87Mistral-Nemo 12B (HPT)2024-06-18
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link92.4ST-MoE-32B 269B (fine-tuned)2022-02-17
PaLM: Scaling Language Modeling with Pathways✓ Link92.2PaLM 540B (fine-tuned)2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE92Turing NLR v5 XXL 5.4B (fine-tuned)2022-12-04
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link91.2T5-XXL 11B (fine-tuned)2019-10-23
PaLM 2 Technical Report✓ Link90.9PaLM 2-L (1-shot)2023-05-17
UL2: Unifying Language Learning Paradigms✓ Link90.8UL2 20B (fine-tuned)2022-05-10
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE90.5Vega v2 6B (fine-tuned)2022-12-04
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link90.4DeBERTa-1.5B2020-06-05
PaLM 2 Technical Report✓ Link88.6PaLM 2-M (1-shot)2023-05-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link88.6ST-MoE-L 4.1B (fine-tuned)2022-02-17
PaLM 2 Technical Report✓ Link88.1PaLM 2-S (1-shot)2023-05-17
Muppet: Massive Multi-task Representations with Pre-Finetuning✓ Link87.5MUPPET Roberta Large2021-01-26
Finetuned Language Models Are Zero-Shot Learners✓ Link86.3FLAN 137B (prompt-tuned)2021-09-03
Entailment as Few-Shot Learner✓ Link86.0RoBERTa-large 355M + Entailment as Few-shot Learner2021-04-29
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link85.4T5-Large 770M (fine-tuned)2019-10-23
LLaMA: Open and Efficient Foundation Language Models✓ Link85.3LLaMA 65B (0-shot)2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link85LLaMA 2 70B (0-shot)2023-07-18
Finetuned Language Models Are Zero-Shot Learners✓ Link84.6FLAN 137B (4-shot)2021-09-03
Muppet: Massive Multi-task Representations with Pre-Finetuning✓ Link83.8MUPPET Roberta Base2021-01-26
Training Compute-Optimal Large Language Models✓ Link83.7Chinchilla 70B (0-shot)2022-03-29
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link83.7LLaMA 2 34B (0-shot)2023-07-18
LLaMA: Open and Efficient Foundation Language Models✓ Link83.1LLaMA 33B (0-shot)2023-02-27
Finetuned Language Models Are Zero-Shot Learners✓ Link82.9FLAN 137B (0-shot)2021-09-03
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link81.7LLaMA 2 13B (0-shot)2023-07-18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link81.4T5-Base 220M (fine-tuned)2019-10-23
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions✓ Link80.4BERT-MultiNLI 340M (fine-tuned)2019-05-24
Scaling Language Models: Methods, Analysis & Insights from Training Gopher✓ Link79.3Gopher (zero-shot)2021-12-08
LLaMA: Open and Efficient Foundation Language Models✓ Link78.1LLaMA 13B (zero-shot)2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link77.4LLaMA 2 7B (zero-shot)2023-07-18
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link77.1LLaMA-2 13B + MixLoRA2024-04-22
LLaMA: Open and Efficient Foundation Language Models✓ Link76.5LLaMA 7B (zero-shot)2023-02-27
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link76.4T5-Small 60M (fine-tuned)2019-10-23
Language Models are Few-Shot Learners✓ Link76.4GPT-3 175B (few-shot, k=32)2020-05-28
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions✓ Link75.57BiDAF-MultiNLI (fine-tuned)2019-05-24
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link75LLaMA-3 8B + MixLoRA2024-04-22
BloombergGPT: A Large Language Model for Finance✓ Link74.6Bloomberg GPT 50B (1-shot)2023-03-30
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link74.6LLaMA3+MoSLoRA2024-06-16
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions✓ Link72.87GPT-1 117M (fine-tuned)2019-05-24
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link72.7LLaMA-2 7B + MixLoRA2024-04-22
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions✓ Link71.41BiDAF + ELMo (fine-tuned)2019-05-24
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link71.4OPT-IML 175B2022-12-22
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model✓ Link69.4AlexaTM 20B2022-08-02
Ask Me Anything: A simple strategy for prompting language models✓ Link 67.2Neo-6B (QA + WS)2022-10-05
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link66.9OPT-IML 30B2022-12-22
Ask Me Anything: A simple strategy for prompting language models✓ Link66.5Neo-6B (few-shot)2022-10-05
N-Grammer: Augmenting Transformers with latent n-grams✓ Link65N-Grammer 343M2022-07-13
Ask Me Anything: A simple strategy for prompting language models✓ Link64.9Neo-6B (QA)2022-10-05
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link64OPT 30B (0-shot)2022-12-22
UL2: Unifying Language Learning Paradigms✓ Link63.1UL2 20B (0-shot)2022-05-10
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions✓ Link62.17Majority baseline2019-05-24
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link61.7Hybrid H3 1.3B (0-shot, logit scoring)2022-12-28
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link61.5OPT-IML 1.3B (0-shot)2022-12-22
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments61.1Shakti-LLM (2.5B)2024-10-15
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link60.6Hybrid H3 2.7B (3-shot, logit scoring)2022-12-28
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link60.5OPT 1.3B (zero-shot)2022-12-22
Language Models are Few-Shot Learners✓ Link60.5GPT-3 75B (0-shot)2020-05-28
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link60.1OPT 175B2022-12-22
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link59.6Hybrid H3 125M (0-shot, logit scoring)2022-12-28
BloombergGPT: A Large Language Model for Finance✓ Link57.5OPT 66B (1-shot)2023-03-30
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link56.1Hybrid H3 125M (3-shot, logit scoring)2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link56.1Hybrid H3 125M (3-shot, rank classification)2022-12-28
BloombergGPT: A Large Language Model for Finance✓ Link52.9BLOOM 176B (1-shot)2023-03-30
Hyena Hierarchy: Towards Larger Convolutional Language Models✓ Link51.8Hyena2023-02-21
BloombergGPT: A Large Language Model for Finance✓ Link46.4GPT-NeoX 20B (1-shot)2023-03-30