common-sense-reasoning-on-record

Common Sense Reasoning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	EM	F1	ModelName	ReleaseDate
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE		95.9	96.4	Turing NLR v5 XXL 5.4B (fine-tuned)	2022-12-04
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	95.1		ST-MoE-32B 269B (fine-tuned)	2022-02-17
DeBERTa: Decoding-enhanced BERT with Disentangled Attention	✓ Link	94.1	94.5	DeBERTa-1.5B	2020-06-05
PaLM: Scaling Language Modeling with Pathways	✓ Link	94.0	94.6	PaLM 540B (finetuned)	2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE		93.9	94.4	Vega v2 6B (fine-tuned)	2022-12-04
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	93.4		T5-XXL 11B (fine-tuned)	2019-10-23
Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model		91.7	92.2	GESA 500M	2023-07-19
LUKE-Graph: A Transformer-based Approach with Gated Relational Graph Attention for Cloze-style Reading Comprehension		91.2	91.5	LUKE-Graph	2023-03-12
[]()		90.640	91.209	LUKE (single model)
LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention	✓ Link	90.6	91.2	LUKE 483M	2020-10-02
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs	✓ Link	89.1	89.6	KELM (finetuning RoBERTa-large based single model)	2021-09-09
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	88.9		ST-MoE-L 4.1B (fine-tuned)	2022-02-17
Finetuned Language Models Are Zero-Shot Learners	✓ Link	85.1		FLAN 137B (prompt-tuned)	2021-09-03
[]()		83.090	83.737	XLNet + MTL + Verifier (ensemble)
Language Models are Few-Shot Learners	✓ Link	82.1		GPT-3 Large 760M (0-shot)	2020-05-28
[]()		81.780	82.584	CSRLM (single model)
Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks		81.5	82.7	XLNet + Verifier	2019-11-01
[]()		81.460	82.664	XLNet + MTL + Verifier (single model)
Efficient Language Modeling with Sparse all-MLP		79.9		Switch Transformer 9B	2022-03-14
[]()		79.480	80.038	{SKG-NET} (single model)
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs	✓ Link	76.2	76.7	KELM (finetuning BERT-large based single model)	2021-09-09
Efficient Language Modeling with Sparse all-MLP		73.4		sMLP – deterministic 9.4B (0-shot)	2022-03-14
Finetuned Language Models Are Zero-Shot Learners	✓ Link	72.5		FLAN 137B (zero-shot)	2021-09-03
Efficient Language Modeling with Sparse all-MLP		72.4		Gshard 9B	2022-03-14
[]()		72.240	72.778	SKG-BERT (single model)
[]()		71.600	73.620	KT-NET (single model)
[]()		69.490	71.138	DCReader+BERT (single model)
Efficient Language Modeling with Sparse all-MLP		67.2		HASH Layers 10B (0-shot)	2022-03-14
[]()		60.800	62.986	GraphBert (single)
Efficient Language Modeling with Sparse all-MLP		60.7		Base Layers 10B (0-shot)	2022-03-14
[]()		59.860	61.885	GraphBert-WordNet (single)
[]()		59.410	61.515	GraphBert-NELL (single)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	✓ Link	54.040	56.065	BERT-Base (single model)	2018-10-11
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension		45.4	46.7	DocQA + ELMo	2018-10-30
N-Grammer: Augmenting Transformers with latent n-grams	✓ Link	28.9	29.9	N-Grammer 343M	2022-07-13
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link		94.1	T5-11B	2019-10-23
PaLM 2 Technical Report	✓ Link		93.8	PaLM 2-L (one-shot)	2023-05-17
PaLM 2 Technical Report	✓ Link		92.4	PaLM 2-M (one-shot)	2023-05-17
PaLM 2 Technical Report	✓ Link		92.1	PaLM 2-S (one-shot)	2023-05-17
Large Language Models are Zero-Shot Reasoners	✓ Link		90.2	GPT-3 175B (one-shot)	2022-05-24
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model	✓ Link		88.4	AlexaTM 20B	2022-08-02
BloombergGPT: A Large Language Model for Finance	✓ Link		82.8	Bloomberg GPT 50B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link		82.5	OPT 66B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link		78	BLOOM 176B (1-shot)	2023-03-30
BloombergGPT: A Large Language Model for Finance	✓ Link		67.9	GPT-NeoX 20B (1-shot)	2023-03-30

OpenCodePapers

common-sense-reasoning-on-record