question-answering-on-boolq

Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	✓ Link	99.87	Mistral-Nemo 12B (HPT)	2024-06-18
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	92.4	ST-MoE-32B 269B (fine-tuned)	2022-02-17
PaLM: Scaling Language Modeling with Pathways	✓ Link	92.2	PaLM 540B (fine-tuned)	2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE		92	Turing NLR v5 XXL 5.4B (fine-tuned)	2022-12-04
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	91.2	T5-XXL 11B (fine-tuned)	2019-10-23
PaLM 2 Technical Report	✓ Link	90.9	PaLM 2-L (1-shot)	2023-05-17
UL2: Unifying Language Learning Paradigms	✓ Link	90.8	UL2 20B (fine-tuned)	2022-05-10
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE		90.5	Vega v2 6B (fine-tuned)	2022-12-04
DeBERTa: Decoding-enhanced BERT with Disentangled Attention	✓ Link	90.4	DeBERTa-1.5B	2020-06-05
PaLM 2 Technical Report	✓ Link	88.6	PaLM 2-M (1-shot)	2023-05-17
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	88.6	ST-MoE-L 4.1B (fine-tuned)	2022-02-17
PaLM 2 Technical Report	✓ Link	88.1	PaLM 2-S (1-shot)	2023-05-17
Muppet: Massive Multi-task Representations with Pre-Finetuning	✓ Link	87.5	MUPPET Roberta Large	2021-01-26
Finetuned Language Models Are Zero-Shot Learners	✓ Link	86.3	FLAN 137B (prompt-tuned)	2021-09-03
Entailment as Few-Shot Learner	✓ Link	86.0	RoBERTa-large 355M + Entailment as Few-shot Learner	2021-04-29
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	85.4	T5-Large 770M (fine-tuned)	2019-10-23
LLaMA: Open and Efficient Foundation Language Models	✓ Link	85.3	LLaMA 65B (0-shot)	2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	85	LLaMA 2 70B (0-shot)	2023-07-18
Finetuned Language Models Are Zero-Shot Learners	✓ Link	84.6	FLAN 137B (4-shot)	2021-09-03
Muppet: Massive Multi-task Representations with Pre-Finetuning	✓ Link	83.8	MUPPET Roberta Base	2021-01-26
Training Compute-Optimal Large Language Models	✓ Link	83.7	Chinchilla 70B (0-shot)	2022-03-29
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	83.7	LLaMA 2 34B (0-shot)	2023-07-18
LLaMA: Open and Efficient Foundation Language Models	✓ Link	83.1	LLaMA 33B (0-shot)	2023-02-27
Finetuned Language Models Are Zero-Shot Learners	✓ Link	82.9	FLAN 137B (0-shot)	2021-09-03
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	81.7	LLaMA 2 13B (0-shot)	2023-07-18
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	81.4	T5-Base 220M (fine-tuned)	2019-10-23
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions	✓ Link	80.4	BERT-MultiNLI 340M (fine-tuned)	2019-05-24
Scaling Language Models: Methods, Analysis & Insights from Training Gopher	✓ Link	79.3	Gopher (zero-shot)	2021-12-08
LLaMA: Open and Efficient Foundation Language Models	✓ Link	78.1	LLaMA 13B (zero-shot)	2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	77.4	LLaMA 2 7B (zero-shot)	2023-07-18
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	77.1	LLaMA-2 13B + MixLoRA	2024-04-22
LLaMA: Open and Efficient Foundation Language Models	✓ Link	76.5	LLaMA 7B (zero-shot)	2023-02-27
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	76.4	T5-Small 60M (fine-tuned)	2019-10-23
Language Models are Few-Shot Learners	✓ Link	76.4	GPT-3 175B (few-shot, k=32)	2020-05-28
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions	✓ Link	75.57	BiDAF-MultiNLI (fine-tuned)	2019-05-24
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	75	LLaMA-3 8B + MixLoRA	2024-04-22
BloombergGPT: A Large Language Model for Finance	✓ Link	74.6	Bloomberg GPT 50B (1-shot)	2023-03-30
Mixture-of-Subspaces in Low-Rank Adaptation	✓ Link	74.6	LLaMA3+MoSLoRA	2024-06-16
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions	✓ Link	72.87	GPT-1 117M (fine-tuned)	2019-05-24
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	72.7	LLaMA-2 7B + MixLoRA	2024-04-22
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions	✓ Link	71.41	BiDAF + ELMo (fine-tuned)	2019-05-24
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization	✓ Link	71.4	OPT-IML 175B	2022-12-22
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model	✓ Link	69.4	AlexaTM 20B	2022-08-02
Ask Me Anything: A simple strategy for prompting language models	✓ Link	67.2	Neo-6B (QA + WS)	2022-10-05
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization	✓ Link	66.9	OPT-IML 30B	2022-12-22
Ask Me Anything: A simple strategy for prompting language models	✓ Link	66.5	Neo-6B (few-shot)	2022-10-05
N-Grammer: Augmenting Transformers with latent n-grams	✓ Link	65	N-Grammer 343M	2022-07-13
Ask Me Anything: A simple strategy for prompting language models	✓ Link	64.9	Neo-6B (QA)	2022-10-05
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization	✓ Link	64	OPT 30B (0-shot)	2022-12-22
UL2: Unifying Language Learning Paradigms	✓ Link	63.1	UL2 20B (0-shot)	2022-05-10
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions	✓ Link	62.17	Majority baseline	2019-05-24
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	61.7	Hybrid H3 1.3B (0-shot, logit scoring)	2022-12-28
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization	✓ Link	61.5	OPT-IML 1.3B (0-shot)	2022-12-22
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments		61.1	Shakti-LLM (2.5B)	2024-10-15
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	60.6	Hybrid H3 2.7B (3-shot, logit scoring)	2022-12-28
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization	✓ Link	60.5	OPT 1.3B (zero-shot)	2022-12-22
Language Models are Few-Shot Learners	✓ Link	60.5	GPT-3 75B (0-shot)	2020-05-28
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization	✓ Link	60.1	OPT 175B	2022-12-22
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	59.6	Hybrid H3 125M (0-shot, logit scoring)	2022-12-28
BloombergGPT: A Large Language Model for Finance	✓ Link	57.5	OPT 66B (1-shot)	2023-03-30
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	56.1	Hybrid H3 125M (3-shot, logit scoring)	2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	56.1	Hybrid H3 125M (3-shot, rank classification)	2022-12-28
BloombergGPT: A Large Language Model for Finance	✓ Link	52.9	BLOOM 176B (1-shot)	2023-03-30
Hyena Hierarchy: Towards Larger Convolutional Language Models	✓ Link	51.8	Hyena	2023-02-21
BloombergGPT: A Large Language Model for Finance	✓ Link	46.4	GPT-NeoX 20B (1-shot)	2023-03-30

OpenCodePapers

question-answering-on-boolq