OpenCodePapers

natural-language-inference-on-multinli

Natural Language Inference
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeMatchedMismatchedAccuracyDev MatchedDev MismatchedModelNameReleaseDate
[]()92.692.4Turing NLR v5 XXL 5.4B (fine-tuned)
First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI92.6UnitedSynT5 (3B)2024-12-12
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link92.091.7T52019-11-08
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link92.0T5-XXL 11B (fine-tuned)2019-10-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link91.491.2T5-3B2019-10-23
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations✓ Link91.3ALBERT2019-09-26
DeBERTa: Decoding-enhanced BERT with Disentangled Attention✓ Link91.191.1DeBERTa (large)2020-06-05
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding91.190.7Adv-RoBERTa ensemble2019-08-13
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link90.8RoBERTa2019-07-26
XLNet: Generalized Autoregressive Pretraining for Language Understanding✓ Link90.8XLNet (single model)2019-06-19
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale✓ Link90.2RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned)2022-08-15
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link89.9T5-Large2019-10-23
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks✓ Link89.9PSQ (Chen et al., 2020)2020-10-27
First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI89.8UnitedSynT5 (335M)2024-12-12
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding✓ Link88.788.8ERNIE 2.0 Large2019-07-29
SpanBERT: Improving Pre-training by Representing and Predicting Spans✓ Link88.1SpanBERT2019-07-24
FNet: Mixing Tokens with Fourier Transforms✓ Link8888BERT-Large2021-05-09
Adversarial Self-Attention for Language Understanding✓ Link88ASA + RoBERTa2022-06-25
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding✓ Link87.987.4MT-DNN-ensemble2019-04-20
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT87.8Q-BERT (Shen et al., 2020)2019-09-12
Training Complex Models with Multi-Task Weak Supervision✓ Link87.687.2Snorkel MeTaL (ensemble)2018-10-05
Big Bird: Transformers for Longer Sequences✓ Link87.5BigBird2020-07-28
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link87.186.2T5-Base2019-10-23
Multi-Task Deep Neural Networks for Natural Language Understanding✓ Link86.786.0MT-DNN2019-01-31
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding✓ Link86.785.9BERT-LARGE2018-10-11
RealFormer: Transformer Likes Residual Attention✓ Link86.2886.34RealFormer2020-12-21
Pay Attention to MLPs✓ Link86.286.5gMLP-large2021-05-17
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding✓ Link86.185.5ERNIE 2.0 Base2019-07-29
Q8BERT: Quantized 8Bit BERT✓ Link85.6Q8BERT (Zafrir et al., 2019)2019-10-14
Adversarial Self-Attention for Language Understanding✓ Link85ASA + BERT-base2022-06-25
TinyBERT: Distilling BERT for Natural Language Understanding✓ Link84.683.2TinyBERT-6 67M2019-09-23
Not all layers are equally as important: Every Layer Counts BERT84.484.5ELC-BERT-base 98M (zero init)2023-11-03
How to Train BERT with an Academic Budget✓ Link84.483.824hBERT2021-04-15
ERNIE: Enhanced Language Representation with Informative Entities✓ Link84.083.2ERNIE2019-05-17
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization✓ Link83.784.4Charformer-Tall2021-06-23
Not all layers are equally as important: Every Layer Counts BERT8383.4LTG-BERT-base 98M2023-11-03
TinyBERT: Distilling BERT for Natural Language Understanding✓ Link82.581.8TinyBERT-4 14.5M2019-09-23
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link82.482.3T5-Small2019-10-23
What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis✓ Link82.3181.43MFAE2020-05-07
[]()82.181.4Finetuned Transformer LM
Improving Language Understanding by Generative Pre-Training✓ Link82.181.4Finetuned Transformer LM2018-06-11
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?✓ Link82.081.1SqueezeBERT2020-06-19
Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale✓ Link81.882.0GPST(unsupervised generative syntactic LM)2024-03-13
Not all layers are equally as important: Every Layer Counts BERT79.279.9ELC-BERT-small 24M2023-11-03
Not all layers are equally as important: Every Layer Counts BERT7878.8LTG-BERT-small 24M2023-11-03
FNet: Mixing Tokens with Fourier Transforms✓ Link7876FNet-Large2021-05-09
Attention Boosted Sequential Inference Model73.9 73.9aESIM2018-12-05
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link72.472T5-Large 738M2023-04-27
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding✓ Link72.272.1Multi-task BiLSTM + Attn2018-04-20
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News✓ Link71.472.2Stacked Bi-LSTMs (shortcut connections, max-pooling)2018-11-02
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning✓ Link71.471.3GenSen2018-03-30
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News✓ Link70.771.1Bi-LSTM sentence encoder (max-pooling)2018-11-02
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News✓ Link70.770.5Stacked Bi-LSTMs (shortcut connections, max-pooling, attention)2018-11-02
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms✓ Link68.267.7SWEM-max2018-05-24
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link67.569.3LaMini-GPT 1.5B2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link61.461LaMini-F-T5 783M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link54.755.8LaMini-T5 738M2023-04-27
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions✓ Link36.537GPT-2-XL 1.5B2023-04-27
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link91.7T5-11B2019-10-23
RoBERTa: A Robustly Optimized BERT Pretraining Approach✓ Link90.2RoBERTa (ensemble)2019-07-26
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer✓ Link89.6T5-Large 770M2019-10-23
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link85.7MT-DNN-SMARTv02019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link85.7MT-DNN-SMART2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link85.6SMART+BERT-BASE2019-11-08
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning✓ Link68.4LM-CPPF RoBERTa-base2023-05-29
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link91.191.3SMARTRoBERTa2019-11-08
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization✓ Link85.686.0SMART-BERT2019-11-08