coreference-resolution-on-winograd-schema

Coreference Resolution

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
PaLM: Scaling Language Modeling with Pathways	✓ Link	100	PaLM 540B (fine-tuned)	2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE		98.6	Vega v2 6B (KD-based prompt transfer)	2022-12-04
UL2: Unifying Language Learning Paradigms	✓ Link	98.1	UL2 20B (fine-tuned)	2022-05-10
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE		97.3	Turing NLR v5 XXL 5.4B (fine-tuned)	2022-12-04
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	96.6	ST-MoE-32B 269B (fine-tuned)	2022-02-17
DeBERTa: Decoding-enhanced BERT with Disentangled Attention	✓ Link	95.9	DeBERTa-1.5B	2020-06-05
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	✓ Link	93.8	T5-XXL 11B (fine-tuned)	2019-10-23
ST-MoE: Designing Stable and Transferable Sparse Expert Models	✓ Link	93.3	ST-MoE-L 4.1B (fine-tuned)	2022-02-17
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	90.1	RoBERTa-WinoGrande 355M	2019-07-24
Scaling Instruction-Finetuned Language Models	✓ Link	89.82	Flan-T5 XXL (zero -shot)	2022-10-20
PaLM: Scaling Language Modeling with Pathways	✓ Link	89.5	PaLM 540B (5-shot)	2022-04-05
PaLM: Scaling Language Modeling with Pathways	✓ Link	89.1	PaLM 540B (0-shot)	2022-04-05
PaLM 2 Technical Report	✓ Link	88.1	PaLM 2-M (1-shot)	2023-05-17
PaLM 2 Technical Report	✓ Link	86.9	PaLM 2-L (1-shot)	2023-05-17
Finetuned Language Models Are Zero-Shot Learners	✓ Link	86.5	FLAN 137B (prompt-tuned)	2021-09-03
PaLM: Scaling Language Modeling with Pathways	✓ Link	86.3	PaLM 540B (1-shot)	2022-04-05
TTTTTackling WinoGrande Schemas		84.6	TTTTT 3B (fine-tuned)	2020-03-18
PaLM 2 Technical Report	✓ Link	84.6	PaLM 2-S (1-shot)	2023-05-17
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	83.1	RoBERTa-DPR 355M	2019-07-24
Finetuned Language Models Are Zero-Shot Learners	✓ Link	80.8	FLAN 137B (zero-shot)	2021-09-03
Language Models are Few-Shot Learners	✓ Link	80.1	GPT-3 175B (few-shot)	2020-05-28
Generative Data Augmentation for Commonsense Reasoning	✓ Link	80	RoBERTa-large + G-DAug-Inf	2020-04-24
UL2: Unifying Language Learning Paradigms	✓ Link	79.9	UL2 20B (0-shot)	2022-05-10
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		78.8	ALBERT-xxlarge 235M	2021-04-16
Ask Me Anything: A simple strategy for prompting language models	✓ Link	77.9	Neo-6B (QA + WS)	2022-10-05
A Hybrid Neural Network Model for Commonsense Reasoning	✓ Link	75.1	HNN	2019-07-27
Ask Me Anything: A simple strategy for prompting language models	✓ Link	74.7	Neo-6B (QA)	2022-10-05
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		73.9	RoBERTa-large 354M	2021-04-16
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	73.3	GPT-2-XL 1.5B	2023-04-27
A Surprisingly Robust Trick for Winograd Schema Challenge	✓ Link	72.5	BERTwiki 340M (fine-tuned on WSCR)	2019-05-15
SocialIQA: Commonsense Reasoning about Social Interactions	✓ Link	72.5	BERT-SocialIQA 340M	2019-04-22
A Surprisingly Robust Trick for Winograd Schema Challenge	✓ Link	71.4	BERT-large 340M (fine-tuned on WSCR)	2019-05-15
Language Models are Unsupervised Multitask Learners	✓ Link	70.7	GPT-2-XL 1.5B	2019-02-14
A Surprisingly Robust Trick for Winograd Schema Challenge	✓ Link	70.3	BERTwiki 340M (fine-tuned on half of WSCR)	2019-05-15
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	69.6	LaMini-GPT 1.5B	2023-04-27
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG	✓ Link	69.2	GPT-2 Medium 774M (partial scoring)	2018-11-05
N-Grammer: Augmenting Transformers with latent n-grams	✓ Link	68.3	N-Grammer 343M	2022-07-13
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model	✓ Link	68.3	AlexaTM 20B	2022-08-02
SocialIQA: Commonsense Reasoning about Social Interactions	✓ Link	67	BERT-large 340M	2019-04-22
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	66.7	T5-Large 738M	2023-04-27
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning	✓ Link	66	T0-3B (CoT fine-tuned)	2023-05-23
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models		65.40	KiC-770M	2022-10-28
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG	✓ Link	64.5	GPT-2 Medium 774M (full scoring)	2018-11-05
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	64.1	LaMini-F-T5 783M	2023-04-27
A Simple Method for Commonsense Reasoning	✓ Link	63.7	Ensemble of 14 LMs	2018-06-07
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	63.5	H3 125M (3-shot, rank classification)	2022-12-28
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning		63.0	DSSM	2019-04-03
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		63	RoBERTa-base 125M	2021-04-16
A Simple Method for Commonsense Reasoning	✓ Link	62.6	Word-level CNN+LSTM (partial scoring)	2018-06-07
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning		62.4	UDSSM-II (ensemble)	2019-04-03
A Surprisingly Robust Trick for Winograd Schema Challenge	✓ Link	62.3	BERT-base 110M (fine-tuned on WSCR)	2019-05-15
Exploring the Benefits of Training Expert Language Models over Instruction Tuning	✓ Link	62.21	RoE-3B	2023-02-07
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	✓ Link	62.0	BERT-large 340M	2018-10-11
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG	✓ Link	61.5	GPT-2 Small 117M (partial scoring)	2018-11-05
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	61.5	H3 125M (0-shot, rank classification)	2022-12-28
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		61.4	BERT-large 340M	2021-04-16
Attention Is (not) All You Need for Commonsense Reasoning	✓ Link	60.3	BERT-base 110M + MAS	2019-05-31
On Generalization in Coreference Resolution	✓ Link	60.1	longdoc S (OntoNotes + PreCo + LitBank)	2021-09-20
On Generalization in Coreference Resolution	✓ Link	59.4	longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons)	2021-09-20
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning		59.2	UDSSM-II	2019-04-03
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	✓ Link	59	LaMini-T5 738M	2023-04-27
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners	✓ Link	58.37	Flipped-3B	2022-10-06
Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge		58.3	KEE+NKAM winner of the WSC2016	2016-11-13
A Simple Method for Commonsense Reasoning	✓ Link	57.9	Char-level CNN+LSTM (partial scoring)	2018-06-07
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning		57.1	UDSSM-I (ensemble)	2019-04-03
A Knowledge Hunting Framework for Common Sense Reasoning		57.1	Knowledge Hunter	2018-10-02
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	57.1	WKH	2019-07-24
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		56.5	BERT-base 110M	2021-04-16
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG	✓ Link	55.7	GPT-2 Small 117M (full scoring)	2018-11-05
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		55.4	ALBERT-base 11M	2021-04-16
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	54.8	Pythia 12B (0-shot)	2023-04-03
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning		54.5	UDSSM-I	2019-04-03
Attention Is All You Need	✓ Link	54.1	Subword-level Transformer LM	2017-06-12
Attention Is (not) All You Need for Commonsense Reasoning	✓ Link	52.8	USSM + Supervised DeepNet + KB	2019-05-31
WinoGrande: An Adversarial Winograd Schema Challenge at Scale	✓ Link	52.8	KEE+NKAM on WinoGrande	2019-07-24
Attention Is (not) All You Need for Commonsense Reasoning	✓ Link	52	USSM + KB	2019-05-31
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema		50	Random chance baseline	2021-04-16
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	✓ Link	43.3	Hybrid H3 125M (3-shot, logit scoring)	2022-12-28
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	38.5	Pythia 2.8B (0-shot)	2023-04-03
Ask Me Anything: A simple strategy for prompting language models	✓ Link	36.5	Neo-6B (few-shot)	2022-10-05
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	36.5	Pythia 6.9B (0-shot)	2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling	✓ Link	36.5	Pythia 12B (5-shot)	2023-04-03

OpenCodePapers

coreference-resolution-on-winograd-schema