OpenCodePapers

question-answering-on-copa

Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
PaLM: Scaling Language Modeling with Pathways✓ Link100PaLM 540B (finetuned) 2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE99.4Vega v2 6B (KD-based prompt transfer)2022-12-04
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link99.2ST-MoE-32B 269B (fine-tuned)2022-02-17
UL2: Unifying Language Learning Paradigms✓ Link99UL2 20B (fine-tuned)2022-05-10
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link98.4DeBERTa-Ensemble2020-06-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE98.2Turing NLR v5 XXL 5.4B (fine-tuned)2022-12-04
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link96.8DeBERTa-1.5B2020-06-05
PaLM 2 Technical Report✓ Link96.0PaLM 2-L (1-shot)2023-05-17
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link94.8T5-XXL 11B (fine-tuned)2019-10-23
Finetuned Language Models Are Zero-Shot Learners✓ Link94FLAN 137B (prompt-tuned)2021-09-03
Language Models are Few-Shot Learners✓ Link92GPT-3 175B (few-shot, k=32)2020-05-28
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link92T5-XL 3B (fine-tuned)2019-10-23
Finetuned Language Models Are Zero-Shot Learners✓ Link91FLAN 137B (zero-shot)2021-09-03
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link91ST-MoE-L 4.1B (fine-tuned)2022-02-17
Language Models are Few-Shot Learners✓ Link91GPT-3 175B (0-shot)2020-05-28
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning✓ Link90.9T0-3B (CoT fine-tuned)2023-05-23
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link90.6RoBERTa-Winogrande-ft 355M (fine-tuned)2019-07-24
PaLM 2 Technical Report✓ Link90.0PaLM 2-M (1-shot)2023-05-17
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners✓ Link89.88Flipped-3B2022-10-06
PaLM 2 Technical Report✓ Link89.0PaLM 2-S (1-shot)2023-05-17
BloombergGPT: A Large Language Model for Finance✓ Link88GPT-NeoX (one-shot)2023-03-30
Finetuned Language Models Are Zero-Shot Learners✓ Link87FLAN 137B (few-shot, k=16)2021-09-03
Language Models are Few-Shot Learners✓ Link87GPT-3 175B (1-shot)2020-05-28
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link86.4RoBERTa-ft 355M (fine-tuned)2019-07-24
BloombergGPT: A Large Language Model for Finance✓ Link86Bloomberg GPT (one-shot)2023-03-30
BloombergGPT: A Large Language Model for Finance✓ Link86OPT 66B (one-shot)2023-03-30
Language Models are Few-Shot Learners✓ Link86GPT-3 13B (few-shot, k=32)2020-05-28
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models85.30KiC-770M2022-10-28
UL2: Unifying Language Learning Paradigms✓ Link85UL2 20B (0-shot)2022-05-10
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link84.4RoBERTa-Winogrande 355M (fine-tuned)2019-07-24
Ask Me Anything: A simple strategy for prompting language models✓ Link84.0Neo-6B (QA + WS)2022-10-05
BloombergGPT: A Large Language Model for Finance✓ Link84BLOOM 176B (one-shot)2023-03-30
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link83.4T5-Large 770M (fine-tuned)2019-10-23
SocialIQA: Commonsense Reasoning about Social Interactions✓ Link83.4BERT-SocialIQA 340M2019-04-22
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link81Hybrid H3 2.7B (0-shot, logit scoring)2022-12-28
SocialIQA: Commonsense Reasoning about Social Interactions✓ Link80.8BERT-large 340M2019-04-22
Exploring the Benefits of Training Expert Language Models over Instruction Tuning✓ Link79.25RoE-3B2023-02-07
Efficient Language Modeling with Sparse all-MLP79sMLP – deterministic 9.4B (0-shot)2022-03-14
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs✓ Link78.0KELM (finetuning BERT-large based single model)2021-09-09
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model✓ Link78.0AlexaTM 20B2022-08-02
Ask Me Anything: A simple strategy for prompting language models✓ Link77.0Neo-6B (few-shot)2022-10-05
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link77Hybrid H3 2.7B (3-shot, logit scoring)2022-12-28
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link76.4Causal Strength w/multi-word predicates (presumably on WinoGrande?)2019-07-24
Efficient Language Modeling with Sparse all-MLP76Gshard 9B2022-03-14
Efficient Language Modeling with Sparse all-MLP75Switch Transformer 9B2022-03-14
Language Models are Few-Shot Learners✓ Link73.0GPT-3 Large 760M (0-shot)2020-05-28
Handling Multiword Expressions in Causality Estimation71.2Causal Strength Computation w/multi-word predicates (on ClueWeb12)2017-01-01
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link71.2T5-Base 220M (fine-tuned)2019-10-23
Handling Multiword Expressions in Causality Estimation70.2Causal Strength Computation (on Causal Net)2017-01-01
Handling Multiword Expressions in Causality Estimation69.9Causal Strength Computation (on ClueWeb12)2017-01-01
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link67Hybrid H3 125M (0-shot, logit scoring)2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link67Hybrid H3 125M (0-shot, rank classification)2022-12-28
WinoGrande: An Adversarial Winograd Schema Challenge at Scale✓ Link65.4Pointwise Mutual Information (on 10M stories)2019-07-24
Efficient Language Modeling with Sparse all-MLP64HASH Layers 10B (0-shot)2022-03-14
Efficient Language Modeling with Sparse all-MLP63Base Layers 10B (0-shot)2022-03-14
N-Grammer: Augmenting Transformers with latent n-grams✓ Link60.0N-Grammer 343M2022-07-13
Handling Multiword Expressions in Causality Estimation58.8Pointwise Mutual Information (on Project Gutenberg)2017-01-01
Ask Me Anything: A simple strategy for prompting language models✓ Link58.2Neo-6B (QA)2022-10-05
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link51H3 125M (0-shot, rank classification)2022-12-28
Handling Multiword Expressions in Causality Estimation50Random chance baseline2017-01-01