OpenCodePapers

common-sense-reasoning-on-winogrande

Common Sense Reasoning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link96.1ST-MoE-32B 269B (fine-tuned)2022-02-17
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark✓ Link91.3Unicorn 11B (fine-tuned)2021-03-24
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link90.5CompassMTL 567M with Tailor2022-10-12
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link89.6CompassMTL 567M2022-10-12
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link89.4UnifiedQA 11B (fine-tuned)2020-05-02
The Claude 3 Model Family: Opus, Sonnet, Haiku88.5Claude 3 Opus (5-shot)2024-03-04
GPT-4 Technical Report✓ Link87.5GPT-4 (5-shot)2023-03-15
Task Compass: Scaling Multi-task Pre-training with Task Prefix✓ Link87ExDeBERTa 567M2022-10-12
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link86.3LLaMA-2 13B + MixLoRA2024-04-22
Mixture-of-Subspaces in Low-Rank Adaptation✓ Link85.8LLaMA3 8B+MoSLoRA2024-06-16
PaLM 2 Technical Report✓ Link83.0PaLM 2-L (1-shot)2023-05-17
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link82.1LLaMA-3 8B + MixLoRA2024-04-22
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link81.7ST-MoE-L 4.1B (fine-tuned)2022-02-17
GPT-4 Technical Report✓ Link81.6GPT-3.5 (5-shot)2023-03-15
PaLM: Scaling Language Modeling with Pathways✓ Link81.1PaLM 540B (0-shot)2022-04-05
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks✓ Link80.9Camelidae-8×34B2024-01-05
PaLM 2 Technical Report✓ Link79.2PaLM 2-M (1-shot)2023-05-17
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link79.1RoBERTa-Winogrande 355M (fine-tuned)2019-07-24
PaLM 2 Technical Report✓ Link77.9PaLM 2-S (1-shot)2023-05-17
Mixtral of Experts✓ Link77.2Mixtral 8x7B (0-shot)2024-01-08
PaLM: Scaling Language Modeling with Pathways✓ Link77.0PaLM 62B (0-shot)2022-04-05
PaLM: Scaling Language Modeling with Pathways✓ Link77.0PaLM-cont 62B (0-shot)2022-04-05
LLaMA: Open and Efficient Foundation Language Models✓ Link77.0LLaMA 65B (0-shot)2023-02-27
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts✓ Link76.8LLaMA-2 7B + MixLoRA2024-04-22
LLaMA: Open and Efficient Foundation Language Models✓ Link76.0LLaMA 33B (0-shot)2023-02-27
Mistral 7B✓ Link75.3Mistral 7B (0-shot)2023-10-10
The Claude 3 Model Family: Opus, Sonnet, Haiku75.1Claude 3 Sonnet (5-shot)2024-03-04
Training Compute-Optimal Large Language Models✓ Link74.9Chinchilla 70B (0-shot)2022-03-29
The Claude 3 Model Family: Opus, Sonnet, Haiku74.2Claude 3 Haiku (5-shot)2024-03-04
Mixtral of Experts✓ Link74.2Mistral 7B (0-shot)2024-01-08
Textbooks Are All You Need II: phi-1.5 technical report✓ Link74.0phi-1.5-web 1.3B (zero-shot)2023-09-11
UnifiedQA: Crossing Format Boundaries With a Single QA System✓ Link73.3Unified QA 406M (fine-tuned)2020-05-02
LLaMA: Open and Efficient Foundation Language Models✓ Link73.0LLaMA 13B (0-shot)2023-02-27
Finetuned Language Models Are Zero-Shot Learners✓ Link72.8FLAN 137B (few-shot, k=16)2021-09-03
Generative Data Augmentation for Commonsense Reasoning✓ Link71.4G-DAUG-Combo + RoBERTa-Large2020-04-24
Finetuned Language Models Are Zero-Shot Learners✓ Link71.2FLAN 137B (0-shot)2021-09-03
[]()70.8RWKV v5 Eagle 7B
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM✓ Link70.6Branch-Train-MiX 4x7B (sampling top-1 expert)2024-03-12
Language Models are Few-Shot Learners✓ Link70.2GPT-3 175B (0-shot)2020-05-28
Scaling Language Models: Methods, Analysis & Insights from Training Gopher✓ Link70.1Gopher 280B (0-shot)2021-12-08
LLaMA: Open and Efficient Foundation Language Models✓ Link70.1LLaMA 7B (0-shot)2023-02-27
BloombergGPT: A Large Language Model for Finance✓ Link67BLOOM 176B (1-shot)2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link66.6Pythia 12B (5-shot)2023-04-03
BloombergGPT: A Large Language Model for Finance✓ Link66.1OPT 66B (1-shot)2023-03-30
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link64.9BERT-Winogrande 345M (fine-tuned)2019-07-24
BloombergGPT: A Large Language Model for Finance✓ Link64.1Bloomberg GPT (one-shot)2023-03-30
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link63.9Pythia 12B (0-shot)2023-04-03
Exploring the Benefits of Training Expert Language Models over Instruction Tuning✓ Link61.60RoE-3B2023-02-07
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link60.9Pythia 6.9B (0-shot)2023-04-03
BloombergGPT: A Large Language Model for Finance✓ Link60.6GPT-NeoX (one-shot)2023-03-30
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link59.9FLAN-T5-Large 783M2023-04-27
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link59.4Pythia 2.8B (0-shot)2023-04-03
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link58.9RoBERTa-DPR 355M (0-shot)2019-07-24
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema58.7ALBERT-xxlarge 235M2021-04-16
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners✓ Link58.56Flipped-3B2022-10-06
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link58.3GPT-2-XL 1.5B2023-04-27
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning✓ Link57.5T0-3B (CoT fine-tuned)2023-05-23
Language Models are Few-Shot Learners✓ Link57.4GPT-3 Large 760M (0-shot)2020-05-28
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema56.3RoBERTa-base 125M2021-04-16
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link56LaMini-F-T5 783M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link56LaMini-GPT 1.5B2023-04-27
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema55.6BERT-large 345M2021-04-16
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models55.30KiC-770M2022-10-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link55.2T5-Large 738M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link54.9LaMini-T5 738M2023-04-27
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema54.9RoBERTa-large 355M2021-04-16
Efficient Language Modeling with Sparse all-MLP54.3sMLP – deterministic 9.4B (0-shot)2022-03-14
Efficient Language Modeling with Sparse all-MLP53.4Switch Transformer 9B (0-shot)2022-03-14
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema53.1BERT-base 110M2021-04-16
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema52.8ALBERT-base 11M2021-04-16
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link51.9BERT-large 345M (0-shot)2019-07-24
Efficient Language Modeling with Sparse all-MLP51.7HASH Layers 10B (0-shot)2022-03-14
Efficient Language Modeling with Sparse all-MLP51.1Gshard 9B (0-shot)2022-03-14
Efficient Language Modeling with Sparse all-MLP51Base Layers 10B (0-shot)2022-03-14
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link51BERT-DPR 345M (0-shot)2019-07-24
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema50Random baseline2021-04-16
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link50RoBERTa-large 355M (0-shot)2019-07-24