common-sense-reasoning-on-winogrande

Common Sense Reasoning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	96.1	ST-MoE-32B 269B (fine-tuned)	2022-02-17
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark	✓ Link	91.3	Unicorn 11B (fine-tuned)	2021-03-24
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	90.5	CompassMTL 567M with Tailor	2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	89.6	CompassMTL 567M	2022-10-12
UnifiedQA: Crossing Format Boundaries With a Single QA System	✓ Link	89.4	UnifiedQA 11B (fine-tuned)	2020-05-02
The Claude 3 Model Family: Opus, Sonnet, Haiku		88.5	Claude 3 Opus (5-shot)	2024-03-04
GPT-4 Technical Report	✓ Link	87.5	GPT-4 (5-shot)	2023-03-15
Task Compass: Scaling Multi-task Pre-training with Task Prefix	✓ Link	87	ExDeBERTa 567M	2022-10-12
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	86.3	LLaMA-2 13B + MixLoRA	2024-04-22
Mixture-of-Subspaces in Low-Rank Adaptation	✓ Link	85.8	LLaMA3 8B+MoSLoRA	2024-06-16
PaLM 2 Technical Report	✓ Link	83.0	PaLM 2-L (1-shot)	2023-05-17
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	82.1	LLaMA-3 8B + MixLoRA	2024-04-22
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	81.7	ST-MoE-L 4.1B (fine-tuned)	2022-02-17
GPT-4 Technical Report	✓ Link	81.6	GPT-3.5 (5-shot)	2023-03-15
PaLM: Scaling Language Modeling with Pathways	✓ Link	81.1	PaLM 540B (0-shot)	2022-04-05
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks	✓ Link	80.9	Camelidae-8×34B	2024-01-05
PaLM 2 Technical Report	✓ Link	79.2	PaLM 2-M (1-shot)	2023-05-17
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	79.1	RoBERTa-Winogrande 355M (fine-tuned)	2019-07-24
PaLM 2 Technical Report	✓ Link	77.9	PaLM 2-S (1-shot)	2023-05-17
Mixtral of Experts	✓ Link	77.2	Mixtral 8x7B (0-shot)	2024-01-08
PaLM: Scaling Language Modeling with Pathways	✓ Link	77.0	PaLM 62B (0-shot)	2022-04-05
PaLM: Scaling Language Modeling with Pathways	✓ Link	77.0	PaLM-cont 62B (0-shot)	2022-04-05
LLaMA: Open and Efficient Foundation Language Models	✓ Link	77.0	LLaMA 65B (0-shot)	2023-02-27
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts	✓ Link	76.8	LLaMA-2 7B + MixLoRA	2024-04-22
LLaMA: Open and Efficient Foundation Language Models	✓ Link	76.0	LLaMA 33B (0-shot)	2023-02-27
Mistral 7B	✓ Link	75.3	Mistral 7B (0-shot)	2023-10-10
The Claude 3 Model Family: Opus, Sonnet, Haiku		75.1	Claude 3 Sonnet (5-shot)	2024-03-04
Training Compute-Optimal Large Language Models	✓ Link	74.9	Chinchilla 70B (0-shot)	2022-03-29
The Claude 3 Model Family: Opus, Sonnet, Haiku		74.2	Claude 3 Haiku (5-shot)	2024-03-04
Mixtral of Experts	✓ Link	74.2	Mistral 7B (0-shot)	2024-01-08
Textbooks Are All You Need II: phi-1.5 technical report	✓ Link	74.0	phi-1.5-web 1.3B (zero-shot)	2023-09-11
UnifiedQA: Crossing Format Boundaries With a Single QA System	✓ Link	73.3	Unified QA 406M (fine-tuned)	2020-05-02
LLaMA: Open and Efficient Foundation Language Models	✓ Link	73.0	LLaMA 13B (0-shot)	2023-02-27
Finetuned Language Models Are Zero-Shot Learners	✓ Link	72.8	FLAN 137B (few-shot, k=16)	2021-09-03
Generative Data Augmentation for Commonsense Reasoning	✓ Link	71.4	G-DAUG-Combo + RoBERTa-Large	2020-04-24
Finetuned Language Models Are Zero-Shot Learners	✓ Link	71.2	FLAN 137B (0-shot)	2021-09-03
[]()		70.8	RWKV v5 Eagle 7B
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM	✓ Link	70.6	Branch-Train-MiX 4x7B (sampling top-1 expert)	2024-03-12
Language Models are Few-Shot Learners	✓ Link	70.2	GPT-3 175B (0-shot)	2020-05-28
Scaling Language Models: Methods, Analysis & Insights from Training Gopher	✓ Link	70.1	Gopher 280B (0-shot)	2021-12-08
LLaMA: Open and Efficient Foundation Language Models	✓ Link	70.1	LLaMA 7B (0-shot)	2023-02-27
BloombergGPT: A Large Language Model for Finance	✓ Link	67	BLOOM 176B (1-shot)	2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	66.6	Pythia 12B (5-shot)	2023-04-03
BloombergGPT: A Large Language Model for Finance	✓ Link	66.1	OPT 66B (1-shot)	2023-03-30
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	64.9	BERT-Winogrande 345M (fine-tuned)	2019-07-24
BloombergGPT: A Large Language Model for Finance	✓ Link	64.1	Bloomberg GPT (one-shot)	2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	63.9	Pythia 12B (0-shot)	2023-04-03
Exploring the Benefits of Training Expert Language Models over Instruction Tuning	✓ Link	61.60	RoE-3B	2023-02-07
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	60.9	Pythia 6.9B (0-shot)	2023-04-03
BloombergGPT: A Large Language Model for Finance	✓ Link	60.6	GPT-NeoX (one-shot)	2023-03-30
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	59.9	FLAN-T5-Large 783M	2023-04-27
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	59.4	Pythia 2.8B (0-shot)	2023-04-03
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	58.9	RoBERTa-DPR 355M (0-shot)	2019-07-24
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		58.7	ALBERT-xxlarge 235M	2021-04-16
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners	✓ Link	58.56	Flipped-3B	2022-10-06
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	58.3	GPT-2-XL 1.5B	2023-04-27
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning	✓ Link	57.5	T0-3B (CoT fine-tuned)	2023-05-23
Language Models are Few-Shot Learners	✓ Link	57.4	GPT-3 Large 760M (0-shot)	2020-05-28
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		56.3	RoBERTa-base 125M	2021-04-16
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	56	LaMini-F-T5 783M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	56	LaMini-GPT 1.5B	2023-04-27
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		55.6	BERT-large 345M	2021-04-16
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models		55.30	KiC-770M	2022-10-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	55.2	T5-Large 738M	2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	54.9	LaMini-T5 738M	2023-04-27
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		54.9	RoBERTa-large 355M	2021-04-16
Efficient Language Modeling with Sparse all-MLP		54.3	sMLP – deterministic 9.4B (0-shot)	2022-03-14
Efficient Language Modeling with Sparse all-MLP		53.4	Switch Transformer 9B (0-shot)	2022-03-14
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		53.1	BERT-base 110M	2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		52.8	ALBERT-base 11M	2021-04-16
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	51.9	BERT-large 345M (0-shot)	2019-07-24
Efficient Language Modeling with Sparse all-MLP		51.7	HASH Layers 10B (0-shot)	2022-03-14
Efficient Language Modeling with Sparse all-MLP		51.1	Gshard 9B (0-shot)	2022-03-14
Efficient Language Modeling with Sparse all-MLP		51	Base Layers 10B (0-shot)	2022-03-14
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	51	BERT-DPR 345M (0-shot)	2019-07-24
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		50	Random baseline	2021-04-16
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	50	RoBERTa-large 355M (0-shot)	2019-07-24

OpenCodePapers

common-sense-reasoning-on-winogrande