question-answering-on-piqa

Question Answering

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark	✓ Link	90.1	Unicorn 11B (fine-tuned)	2021-03-24
Mixture-of-Subspaces in Low-Rank Adaptation	✓ Link	89.7	LLaMA3 8B+MoSLoRA	2024-06-16
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	88.3	CompassMTL 567M with Tailor	2022-10-12
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	87.6	LLaMA-3 8B + MixLoRA	2024-04-22
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering	✓ Link	87.4	DeBERTa-Large 304M	2022-10-29
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	87.3	CompassMTL 567M	2022-10-12
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	86.8	LLaMA-2 13B + MixLoRA	2024-04-22
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments		86.2	Shakti-LLM (2.5B)	2024-10-15
Two is Better than Many? Binary Classification as an Effective Approach to Multi-Choice Question Answering	✓ Link	85.9	DeBERTa-Large 304M (classification-based)	2022-10-29
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	85.5	ExDeBERTa 567M	2022-10-12
UnifiedQA: Crossing Format Boundaries With a Single QA System	✓ Link	85.3	UnifiedQA 3B	2020-05-02
PaLM 2 Technical Report	✓ Link	85.0	PaLM 2-L (1-shot)	2023-05-17
Mixtral of Experts	✓ Link	83.6	Mixtral 8x7B (0-shot)	2024-01-08
PaLM 2 Technical Report	✓ Link	83.2	PaLM 2-M (1-shot)	2023-05-17
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	83.2	LLaMA-2 7B + MixLoRA	2024-04-22
Mistral 7B	✓ Link	83.0	Mistral 7B (0-shot)	2023-10-10
LLaMA: Open and Efficient Foundation Language Models	✓ Link	82.8	LLaMA 65B (0-shot)	2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	82.8	LLaMA 2 70B (0-shot)	2023-07-18
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks	✓ Link	82.7	Camelidae-8×34B	2024-01-05
LLaMA: Open and Efficient Foundation Language Models	✓ Link	82.3	LLaMA 33B (0-shot)	2023-02-27
PaLM 2 Technical Report	✓ Link	82.2	PaLM 2-S (1-shot)	2023-05-17
Mixtral of Experts	✓ Link	82.2	Mistral 7B (0-shot)	2024-01-08
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	✓ Link	82.0	MT-NLG 530B (0-shot)	2019-09-17
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	81.9	LLaMA 2 34B (0-shot)	2023-07-18
Scaling Language Models: Methods, Analysis & Insights from Training Gopher	✓ Link	81.8	Gopher 280B (0-shot)	2021-12-08
Training Compute-Optimal Large Language Models	✓ Link	81.8	Chinchilla 70B (0-shot)	2022-03-29
Finetuned Language Models Are Zero-Shot Learners	✓ Link	81.7	FLAN 137B (few-shot, k=10)	2021-09-03
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	✓ Link	81.07	OPT-175B	2023-01-02
Language Models are Few-Shot Learners	✓ Link	81.0	GPT-3 175B (0-shot)	2020-05-28
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	✓ Link	80.63	SparseGPT 175B (50% Sparsity)	2023-01-02
Finetuned Language Models Are Zero-Shot Learners	✓ Link	80.5	FLAN 137B (0-shot)	2021-09-03
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	80.5	LLaMA 2 13B (0-shot)	2023-07-18
LLaMA: Open and Efficient Foundation Language Models	✓ Link	80.1	LLaMA 13B (0-shot)	2023-02-27
LLaMA: Open and Efficient Foundation Language Models	✓ Link	79.8	LLaMA 7B (0-shot)	2023-02-27
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	✓ Link	79.54	SparseGPT 175B (4:8 Sparsity)	2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	✓ Link	79.54	SparseGPT 175B (2:4 Sparsity)	2023-01-02
RoBERTa: A Robustly Optimized BERT Pretraining Approach	✓ Link	79.4	RoBERTa-Large 355M	2019-07-26
Llama 2: Open Foundation and Fine-Tuned Chat Models	✓ Link	78.8	LLaMA 2 7B (0-shot)	2023-07-18
BloombergGPT: A Large Language Model for Finance	✓ Link	77.9	Bloomberg GPT 50B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link	77.6	OPT 66B (1-shot)	2023-03-30
PIQA: Reasoning about Physical Commonsense in Natural Language	✓ Link	77.1	RoBERTa-large 355M (fine-tuned)	2019-11-26
Textbooks Are All You Need II: phi-1.5 technical report	✓ Link	77	phi-1.5-web (1.3B)	2023-09-11
BloombergGPT: A Large Language Model for Finance	✓ Link	77	BLOOM 176B (1-shot)	2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	76.7	Pythia 12B (5-shot)	2023-04-03
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning	✓ Link	76.2	Open-LLaMA-3B-v2	2023-10-10
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	76	Pythia 12B (0-shot)	2023-04-03
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning	✓ Link	75.8	Sheared-LLaMA-2.7B	2023-10-10
BloombergGPT: A Large Language Model for Finance	✓ Link	75.8	GPT-NeoX 20B (1-shot)	2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	75.2	Pythia 6.9B (0-shot)	2023-04-03
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning	✓ Link	73.4	Sheared-LLaMA-1.3B	2023-10-10
Efficient Language Modeling with Sparse all-MLP		73	sMLP - deterministic 9.4B (0-shot)	2022-03-14
Language Models are Few-Shot Learners	✓ Link	72.9	GPT-3 Large 760M (0-shot)	2020-05-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	72.2	FLAN-T5-Large 783M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	71.3	LaMini-GPT 1.5B	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	70.6	LaMini-F-T5 783M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	70.5	GPT-2-XL 1.5B	2023-04-27
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	70.4	Pythia 1B (5-shot)	2023-04-03
PIQA: Reasoning about Physical Commonsense in Natural Language	✓ Link	69.2	GPT-2-small 124M (fine-tuned)	2019-11-26
Efficient Language Modeling with Sparse all-MLP		68.1	Gshard 9B	2022-03-14
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	67.2	LaMini-T5 738M	2023-04-27
PIQA: Reasoning about Physical Commonsense in Natural Language	✓ Link	66.8	BERT-large 340M (fine-tuned)	2019-11-26
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	✓ Link	66.7	BERT-Large 340M	2018-10-11
Efficient Language Modeling with Sparse all-MLP		63.8	Base Layers 10B (0-shot)	2022-03-14
Efficient Language Modeling with Sparse all-MLP		63.8	HASH Layers 10B (0-shot)	2022-03-14
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	55.9	T5-Large 738M	2023-04-27
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	✓ Link	54.73	OPT-175B (50% Sparsity)	2023-01-02
PIQA: Reasoning about Physical Commonsense in Natural Language	✓ Link	50	Random chance baseline	2019-11-26

OpenCodePapers

question-answering-on-piqa