OpenCodePapers

coreference-resolution-on-winograd-schema

Coreference Resolution
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
PaLM: Scaling Language Modeling with Pathways✓ Link100PaLM 540B (fine-tuned)2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE98.6Vega v2 6B (KD-based prompt transfer)2022-12-04
UL2: Unifying Language Learning Paradigms✓ Link98.1UL2 20B (fine-tuned)2022-05-10
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE97.3Turing NLR v5 XXL 5.4B (fine-tuned)2022-12-04
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link96.6ST-MoE-32B 269B (fine-tuned)2022-02-17
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link95.9DeBERTa-1.5B2020-06-05
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link93.8T5-XXL 11B (fine-tuned)2019-10-23
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link93.3ST-MoE-L 4.1B (fine-tuned)2022-02-17
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link90.1RoBERTa-WinoGrande 355M2019-07-24
Scaling Instruction-Finetuned Language Models✓ Link89.82Flan-T5 XXL (zero -shot)2022-10-20
PaLM: Scaling Language Modeling with Pathways✓ Link89.5PaLM 540B (5-shot)2022-04-05
PaLM: Scaling Language Modeling with Pathways✓ Link89.1PaLM 540B (0-shot)2022-04-05
PaLM 2 Technical Report✓ Link88.1PaLM 2-M (1-shot)2023-05-17
PaLM 2 Technical Report✓ Link86.9PaLM 2-L (1-shot)2023-05-17
Finetuned Language Models Are Zero-Shot Learners✓ Link86.5FLAN 137B (prompt-tuned)2021-09-03
PaLM: Scaling Language Modeling with Pathways✓ Link86.3PaLM 540B (1-shot)2022-04-05
TTTTTackling WinoGrande Schemas84.6TTTTT 3B (fine-tuned)2020-03-18
PaLM 2 Technical Report✓ Link84.6PaLM 2-S (1-shot)2023-05-17
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link83.1RoBERTa-DPR 355M2019-07-24
Finetuned Language Models Are Zero-Shot Learners✓ Link80.8FLAN 137B (zero-shot)2021-09-03
Language Models are Few-Shot Learners✓ Link80.1GPT-3 175B (few-shot)2020-05-28
Generative Data Augmentation for Commonsense Reasoning✓ Link80RoBERTa-large + G-DAug-Inf2020-04-24
UL2: Unifying Language Learning Paradigms✓ Link79.9UL2 20B (0-shot)2022-05-10
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema78.8ALBERT-xxlarge 235M2021-04-16
Ask Me Anything: A simple strategy for prompting language models✓ Link77.9Neo-6B (QA + WS)2022-10-05
A Hybrid Neural Network Model for Commonsense Reasoning✓ Link75.1HNN2019-07-27
Ask Me Anything: A simple strategy for prompting language models✓ Link74.7Neo-6B (QA)2022-10-05
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema73.9RoBERTa-large 354M2021-04-16
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link73.3GPT-2-XL 1.5B2023-04-27
A Surprisingly Robust Trick for Winograd Schema Challenge✓ Link72.5BERTwiki 340M (fine-tuned on WSCR)2019-05-15
SocialIQA: Commonsense Reasoning about Social Interactions✓ Link72.5BERT-SocialIQA 340M2019-04-22
A Surprisingly Robust Trick for Winograd Schema Challenge✓ Link71.4BERT-large 340M (fine-tuned on WSCR)2019-05-15
Language Models are Unsupervised Multitask Learners✓ Link70.7GPT-2-XL 1.5B2019-02-14
A Surprisingly Robust Trick for Winograd Schema Challenge✓ Link70.3BERTwiki 340M (fine-tuned on half of WSCR)2019-05-15
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link69.6LaMini-GPT 1.5B2023-04-27
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG✓ Link69.2GPT-2 Medium 774M (partial scoring)2018-11-05
N-Grammer: Augmenting Transformers with latent n-grams✓ Link68.3N-Grammer 343M2022-07-13
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model✓ Link68.3AlexaTM 20B2022-08-02
SocialIQA: Commonsense Reasoning about Social Interactions✓ Link67BERT-large 340M2019-04-22
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link66.7T5-Large 738M2023-04-27
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning✓ Link66T0-3B (CoT fine-tuned)2023-05-23
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models65.40KiC-770M2022-10-28
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG✓ Link64.5GPT-2 Medium 774M (full scoring)2018-11-05
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link64.1LaMini-F-T5 783M2023-04-27
A Simple Method for Commonsense Reasoning✓ Link63.7Ensemble of 14 LMs2018-06-07
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link63.5H3 125M (3-shot, rank classification)2022-12-28
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning63.0DSSM2019-04-03
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema63RoBERTa-base 125M2021-04-16
A Simple Method for Commonsense Reasoning✓ Link62.6Word-level CNN+LSTM (partial scoring)2018-06-07
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning62.4UDSSM-II (ensemble)2019-04-03
A Surprisingly Robust Trick for Winograd Schema Challenge✓ Link62.3BERT-base 110M (fine-tuned on WSCR)2019-05-15
Exploring the Benefits of Training Expert Language Models over Instruction Tuning✓ Link62.21RoE-3B2023-02-07
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding✓ Link62.0BERT-large 340M2018-10-11
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG✓ Link61.5GPT-2 Small 117M (partial scoring)2018-11-05
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link61.5H3 125M (0-shot, rank classification)2022-12-28
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema61.4BERT-large 340M2021-04-16
Attention Is (not) All You Need for Commonsense Reasoning✓ Link60.3BERT-base 110M + MAS2019-05-31
On Generalization in Coreference Resolution✓ Link60.1longdoc S (OntoNotes + PreCo + LitBank)2021-09-20
On Generalization in Coreference Resolution✓ Link59.4longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons)2021-09-20
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning59.2UDSSM-II2019-04-03
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link59LaMini-T5 738M2023-04-27
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners✓ Link58.37Flipped-3B2022-10-06
Commonsense Knowledge Enhanced Embeddings for Solving Pronoun Disambiguation Problems in Winograd Schema Challenge58.3KEE+NKAM winner of the WSC20162016-11-13
A Simple Method for Commonsense Reasoning✓ Link57.9Char-level CNN+LSTM (partial scoring)2018-06-07
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning57.1UDSSM-I (ensemble)2019-04-03
A Knowledge Hunting Framework for Common Sense Reasoning57.1Knowledge Hunter2018-10-02
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link57.1WKH2019-07-24
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema56.5BERT-base 110M2021-04-16
How Reasonable are Common-Sense Reasoning Tasks: A Case-Study on the Winograd Schema Challenge and SWAG✓ Link55.7GPT-2 Small 117M (full scoring)2018-11-05
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema55.4ALBERT-base 11M2021-04-16
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link54.8Pythia 12B (0-shot)2023-04-03
Unsupervised Deep Structured Semantic Models for Commonsense Reasoning54.5UDSSM-I2019-04-03
Attention Is All You Need✓ Link54.1Subword-level Transformer LM2017-06-12
Attention Is (not) All You Need for Commonsense Reasoning✓ Link52.8USSM + Supervised DeepNet + KB2019-05-31
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link52.8KEE+NKAM on WinoGrande2019-07-24
Attention Is (not) All You Need for Commonsense Reasoning✓ Link52USSM + KB2019-05-31
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema50Random chance baseline2021-04-16
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link43.3Hybrid H3 125M (3-shot, logit scoring)2022-12-28
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link38.5Pythia 2.8B (0-shot)2023-04-03
Ask Me Anything: A simple strategy for prompting language models✓ Link36.5Neo-6B (few-shot)2022-10-05
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link36.5Pythia 6.9B (0-shot)2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link36.5Pythia 12B (5-shot)2023-04-03