OpenCodePapers

natural-language-inference-on-rte

Natural Language Inference
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE96%Vega v2 6B (KD-based prompt transfer)2022-12-04
PaLM: Scaling Language Modeling with Pathways✓ Link95.7%PaLM 540B (fine-tuned)2022-04-05
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE94.1%Turing NLR v5 XXL 5.4B (fine-tuned)2022-12-04
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link93.5%ST-MoE-32B 269B (fine-tuned)2022-02-17
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link93.2%DeBERTa-1.5B2020-06-05
Muppet: Massive Multi-task Representations with Pre-Finetuning✓ Link92.8%MUPPET Roberta Large2021-01-26
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing✓ Link92.7%DeBERTaV3large2021-11-18
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link92.5%T5-XXL 11B2019-11-08
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link92.5%T5-XXL 11B (fine-tuned)2019-10-23
ST-MoE: Designing Stable and Transferable Sparse Expert Models✓ Link92.1%ST-MoE-L 4.1B (fine-tuned)2022-02-17
UL2: Unifying Language Learning Paradigms✓ Link92.1%UL2 20B (fine-tuned)2022-05-10
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link92.0%SMARTRoBERTa2019-11-08
Finetuned Language Models Are Zero-Shot Learners✓ Link91.7%FLAN 137B (prompt-tuned)2021-09-03
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link91.1%T5-XL 3B2019-10-23
Entailment as Few-Shot Learner✓ Link90.5%RoBERTa-large 355M + Entailment as Few-shot Learner2021-04-29
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations✓ Link89.2%ALBERT2019-09-26
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding88.7%Adv-RoBERTa ensemble2019-08-13
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link88.2%RoBERTa2019-07-26
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link88.2%RoBERTa (ensemble)2019-07-26
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link87.4%T5-Large 738M2023-04-27
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link87.2%T5-Large 770M2019-10-23
Entailment as Few-Shot Learner✓ Link87.2%RoBERTa-large 355M + EFL + UCA2021-04-29
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks✓ Link86.8PSQ (Chen et al., 2020)2020-10-27
XLNet: Generalized Autoregressive Pretraining for Language Understanding✓ Link85.9%XLNet (single model)2019-06-19
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale✓ Link85.4%RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)2022-08-15
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link84.8%OPT-IML 175B2022-12-22
Q8BERT: Quantized 8Bit BERT✓ Link84.8Q8BERT (Zafrir et al., 2019)2019-10-14
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT84.7Q-BERT (Shen et al., 2020)2019-09-12
Finetuned Language Models Are Zero-Shot Learners✓ Link84.5%FLAN 137B (8-shot)2021-09-03
Finetuned Language Models Are Zero-Shot Learners✓ Link84.1%FLAN 137B (0-shot)2021-09-03
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link83.8%OPT-IML 30B2022-12-22
[]()83.6%ELECTRA
PaLM 2 Technical Report✓ Link81.9%PaLM 2-M (1-shot)2023-05-17
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning✓ Link80.8%T0-3B (CoT fine-tuned)2023-05-23
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding✓ Link80.2%ERNIE 2.0 Large2019-07-29
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link80.1%T5-Base 220M2019-10-23
CLEAR: Contrastive Learning for Sentence Representation79.8%MLM+ del-span2020-12-31
PaLM: Scaling Language Modeling with Pathways✓ Link79.6%PaLM 540B (5-shot)2022-04-05
PaLM 2 Technical Report✓ Link79.3%PaLM 2-L (1-shot)2023-05-17
SpanBERT: Improving Pre-training by Representing and Predicting Spans✓ Link79.0%SpanBERT2019-07-24
PaLM 2 Technical Report✓ Link78.7%PaLM 2-S (1-shot)2023-05-17
PaLM: Scaling Language Modeling with Pathways✓ Link78.7%PaLM 540B (1-shot)2022-04-05
Ask Me Anything: A simple strategy for prompting language models✓ Link75.1%Neo-6B (QA + WS)2022-10-05
Big Bird: Transformers for Longer Sequences✓ Link75.0%BigBird2020-07-28
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding✓ Link74.8%ERNIE 2.0 Base2019-07-29
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models74.00KiC-770M2022-10-28
RealFormer: Transformer Likes Residual Attention✓ Link73.7%RealFormer2020-12-21
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?✓ Link73.2%SqueezeBERT2020-06-19
PaLM: Scaling Language Modeling with Pathways✓ Link72.9%PaLM 540B (0-shot)2022-04-05
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link71.2%SMART-BERT2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link71.2%SMART2019-11-08
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners✓ Link71.05Flipped-3B2022-10-06
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding✓ Link70.1%BERT-large 340M2018-10-11
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link69.9%T5-Small2019-10-23
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language✓ Link69.9%data2vec2022-02-07
BloombergGPT: A Large Language Model for Finance✓ Link69.3%Bloomberg GPT 50B (1-shot)2023-03-30
FNet: Mixing Tokens with Fourier Transforms✓ Link69%FNet-Large2021-05-09
Language Models are Few-Shot Learners✓ Link69%GPT-3 175B (few-shot, k=32)2020-05-28
ERNIE: Enhanced Language Representation with Informative Entities✓ Link68.8%ERNIE2019-05-17
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model✓ Link68.6%AlexaTM 20B2022-08-02
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link67.9%LaMini-GPT 1.5B2023-04-27
SenseBERT: Driving Some Sense into BERT67.5%SenseBERT-base 110M2019-08-15
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link66.8%OPT-IML 1.3B2022-12-22
TinyBERT: Distilling BERT for Natural Language Understanding✓ Link66%TinyBERT-6 67M2019-09-23
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link65%LaMini-F-T5 783M2023-04-27
Exploring the Benefits of Training Expert Language Models over Instruction Tuning✓ Link64.01RoE-3B2023-02-07
Not all layers are equally as important: Every Layer Counts BERT63ELC-BERT-base 98M (zero init)2023-11-03
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter✓ Link62.9%DistilBERT 66M2019-10-02
TinyBERT: Distilling BERT for Natural Language Understanding✓ Link62.9%TinyBERT-4 14.5M2019-09-23
Ask Me Anything: A simple strategy for prompting language models✓ Link61.7%Neo-6B (QA)2022-10-05
UL2: Unifying Language Learning Paradigms✓ Link60.7%UL2 20B (0-shot)2022-05-10
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link60.3%OPT 175B2022-12-22
N-Grammer: Augmenting Transformers with latent n-grams✓ Link59.2%N-Grammer 343M2022-07-13
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link59.2%Hybrid H3 125M (0-shot, logit scoring)2022-12-28
Ask Me Anything: A simple strategy for prompting language models✓ Link58.8%Neo-6B (few-shot)2022-10-05
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link58.1%OPT 30B2022-12-22
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link58.1%Hybrid H3 125M (3-shot, logit scoring)2022-12-28
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link58.1%Hybrid H3 125M (3-shot, rank classification)2022-12-28
How to Train BERT with an Academic Budget✓ Link57.7%24hBERT2021-04-15
BloombergGPT: A Large Language Model for Finance✓ Link57.4%BLOOM 176B (1-shot)2023-03-30
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link57%LaMini-T5 738M2023-04-27
Not all layers are equally as important: Every Layer Counts BERT55.4ELC-BERT-small 24M2023-11-03
BloombergGPT: A Large Language Model for Finance✓ Link54.9%OPT 66B (1-shot)2023-03-30
Not all layers are equally as important: Every Layer Counts BERT54.7LTG-BERT-base 98M2023-11-03
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization✓ Link54.2%OPT 1.3B2022-12-22
BloombergGPT: A Large Language Model for Finance✓ Link53.8%GPT-NeoX 20B (1-shot)2023-03-30
Not all layers are equally as important: Every Layer Counts BERT53.7LTG-BERT-small 24M2023-11-03
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link53.1%H3 125M (0-shot, rank classification)2022-12-28
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link52.3%GPT-2-XL 1.5B2023-04-27
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link52.3%H3 125M (3-shot, rank classification)2022-12-28