First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI | | 94.7 | | | | | | UnitedSynT5 (3B) | 2024-12-12 |
First Train to Generate, then Generate to Train: UnitedSynT5 for Few-Shot NLI | | 93.5 | | | | | | UnitedSynT5 (335M) | 2024-12-12 |
Entailment as Few-Shot Learner | ✓ Link | 93.1 | | 355 | | | | Neural Tree Indexers for Text Understanding | 2021-04-29 |
Entailment as Few-Shot Learner | ✓ Link | 93.1 | ? | 355m | | | | EFL (Entailment as Few-shot Learner) + RoBERTa-large | 2021-04-29 |
Self-Explaining Structures Improve NLP Models | ✓ Link | 92.3 | | 340 | | | | RoBERTa-large+Self-Explaining | 2020-12-03 |
Self-Explaining Structures Improve NLP Models | ✓ Link | 92.3 | ? | 355m+ | | | | RoBERTa-large + self-explaining layer | 2020-12-03 |
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data | ✓ Link | 92.1 | 92.6 | 340m | | | | CA-MTL | 2020-09-19 |
Semantics-aware BERT for Language Understanding | ✓ Link | 91.9 | 94.4 | 339m | | | | SemBERT | 2019-09-05 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | 91.7 | | | | 92.6 | | MT-DNN-SMARTLARGEv0 | 2019-11-08 |
Multi-Task Deep Neural Networks for Natural Language Understanding | ✓ Link | 91.6 | 97.2 | 330m | | | | MT-DNN | 2019-01-31 |
Explicit Contextual Semantics for Text Comprehension | | 91.3 | 95.7 | 308m | | | | SJRC (BERT-Large +SRL) | 2018-09-08 |
Multi-Task Deep Neural Networks for Natural Language Understanding | ✓ Link | 90.5 | 99.1 | 220 | | | | Ntumpha | 2019-01-31 |
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information | | 90.1 | 95.0 | 53.3m | | | | Densely-Connected Recurrent and Co-Attentive Network Ensemble | 2018-05-29 |
What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis | ✓ Link | 90.07 | 93.18 | | | | | MFAE | 2020-05-07 |
Improving Language Understanding by Generative Pre-Training | ✓ Link | 89.9 | 96.6 | 85m | | | | Fine-Tuned LM-Pretrained Transformer | 2018-06-11 |
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference | ✓ Link | 89.6 | 96.1 | 79m | | | | 300D DMAN Ensemble | 2019-07-23 |
[]() | | 89.6 | 96.1 | 79m | | | | 300D DMAN Ensemble | |
Multiway Attention Networks for Modeling Sentence Pairs | ✓ Link | 89.4 | 95.5 | 58m | | | | 150D Multiway Attention Network Ensemble | 2018-07-01 |
DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference | | 89.3 | 94.8 | 45m | | | | 450D DR-BiLSTM Ensemble | 2018-02-15 |
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference | | 89.3 | 92.5 | 17.5m | | | | 300D CAFE Ensemble | 2017-12-30 |
Deep contextualized word representations | ✓ Link | 89.3 | 92.1 | 40m | | | | ESIM + ELMo Ensemble | 2018-02-15 |
Neural Natural Language Inference Models Enhanced with External Knowledge | ✓ Link | 89.1 | 93.6 | 43m | | | | KIM Ensemble | 2017-11-12 |
Explicit Contextual Semantics for Text Comprehension | | 89.1 | 89.1 | 6.1m | | | | SLRC | 2018-09-08 |
Simple and Effective Text Matching with Richer Alignment Features | ✓ Link | 88.9 | 94.0 | 2.8m | | | | RE2 | 2019-08-01 |
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information | | 88.9 | 93.1 | 6.7m | | | | Densely-Connected Recurrent and Co-Attentive Network | 2018-05-29 |
DEIM: An effective deep encoding and interaction model for sentence matching | | 88.9 | 92.6 | 22m | | | | DEIM | 2022-03-20 |
Natural Language Inference over Interaction Space | ✓ Link | 88.9 | 92.3 | 17m | | | | 448D Densely Interactive Inference Network (DIIN, code) Ensemble | 2017-09-13 |
Discourse Marker Augmented Network with Reinforcement Learning for Natural Language Inference | ✓ Link | 88.8 | 95.4 | 9.2m | | | | 300D DMAN | 2019-07-23 |
[]() | | 88.8 | 95.4 | 9.2m | | | | 300D DMAN | |
Bilateral Multi-Perspective Matching for Natural Language Sentences | ✓ Link | 88.8 | 93.2 | 6.4m | | | | BiMPM Ensemble | 2017-02-13 |
Deep contextualized word representations | ✓ Link | 88.7 | 91.6 | 8.0m | | | | ESIM + ELMo | 2018-02-15 |
Neural Natural Language Inference Models Enhanced with External Knowledge | ✓ Link | 88.6 | 94.1 | 4.3m | | | | KIM | 2017-11-12 |
Enhanced LSTM for Natural Language Inference | ✓ Link | 88.6 | 93.5 | 7.7m | | | | 600D ESIM + 300D Syntactic TreeLSTM | 2016-09-20 |
DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference | | 88.5 | 94.1 | 7.5m | | | | 450D DR-BiLSTM | 2018-02-15 |
Stochastic Answer Networks for Natural Language Inference | ✓ Link | 88.5 | 93.3 | 3.5m | | | | Stochastic Answer Network | 2018-04-21 |
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference | | 88.5 | 89.8 | 4.7m | | | | 300D CAFE | 2017-12-30 |
Multiway Attention Networks for Modeling Sentence Pairs | ✓ Link | 88.3 | 94.5 | 14m | | | | 150D Multiway Attention Network | 2018-07-01 |
Learned in Translation: Contextualized Word Vectors | ✓ Link | 88.1 | 88.5 | 22m | | | | Biattentive Classification Network + CoVe + Char | 2017-08-01 |
Attention Boosted Sequential Inference Model | | 88.1 | | | | | | aESIM | 2018-12-05 |
Natural Language Inference over Interaction Space | ✓ Link | 88.0 | 91.2 | 4.4m | | | | 448D Densely Interactive Inference Network (DIIN, code) | 2017-09-13 |
Enhanced LSTM for Natural Language Inference | ✓ Link | 88.0 | | | | | | Enhanced Sequential Inference Model (Chen et al., [2017a]) | 2016-09-20 |
Bilateral Multi-Perspective Matching for Natural Language Sentences | ✓ Link | 87.5 | 90.9 | 1.6m | | | | BiMPM | 2017-02-13 |
[]() | | 87.5 | 90.7 | 2.0m | | | | 300D re-read LSTM | |
Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition | | 87.5 | 90.7 | 2.0m | | | | 300D re-read LSTM | 2016-12-01 |
Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding | ✓ Link | 87.4 | 89.0 | 7.0m | | | | 2400D Multiple-Dynamic Self-Attention Model | 2018-08-22 |
Neural Tree Indexers for Text Understanding | ✓ Link | 87.3 | 88.5 | 3.2m | | | | 300D Full tree matching NTI-SLSTM-LSTM w/ global attention | 2016-07-15 |
Cell-aware Stacked LSTMs for Modeling Sentences | | 87 | | | | | | 300D 2-layer Bi-CAS-LSTM | 2018-09-07 |
A Decomposable Attention Model for Natural Language Inference | ✓ Link | 86.8 | 90.5 | 580k | | | | 200D decomposable attention feed-forward model with intra-sentence attention | 2016-06-06 |
A Decomposable Attention Model for Natural Language Inference | ✓ Link | 86.8 | 90.5 | 580k | | | | 200D decomposable attention model with intra-sentence attention | 2016-06-06 |
Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding | ✓ Link | 86.8 | 87.3 | 2.1m | | | | 600D Dynamic Self-Attention Model | 2018-08-22 |
Parameter Re-Initialization through Cyclical Batch Size Schedules | | 86.73 | | | | | | CBS-1 + ESIM | 2018-12-04 |
Dynamic Meta-Embeddings for Improved Sentence Representations | ✓ Link | 86.7 | 91.6 | 9m | | | | 512D Dynamic Meta-Embeddings | 2018-04-21 |
Enhancing Sentence Embedding with Generalized Pooling | ✓ Link | 86.6 | 94.9 | 65m | | | | 600D BiLSTM with generalized pooling | 2018-06-26 |
Sentence Embeddings in NLI with Iterative Refinement Encoders | ✓ Link | 86.6 | 89.9 | 22m | | | | 600D Hierarchical BiLSTM with Max Pooling (HBMP, code) | 2018-08-27 |
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information | | 86.5 | 91.4 | 5.6m | | | | Densely-Connected Recurrent and Co-Attentive Network (encoder) | 2018-05-29 |
Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling | ✓ Link | 86.3 | 92.6 | 3.1m | | | | 300D Reinforced Self-Attention Network | 2018-01-31 |
Distance-based Self-Attention Network for Natural Language Inference | | 86.3 | 89.6 | 4.7m | | | | Distance-based Self-Attention Network | 2017-12-06 |
A Decomposable Attention Model for Natural Language Inference | ✓ Link | 86.3 | 89.5 | 380k | | | | 200D decomposable attention feed-forward model | 2016-06-06 |
A Decomposable Attention Model for Natural Language Inference | ✓ Link | 86.3 | 89.5 | 380k | | | | 200D decomposable attention model | 2016-06-06 |
Long Short-Term Memory-Networks for Machine Reading | ✓ Link | 86.3 | 88.5 | 3.4m | | | | 450D LSTMN with deep attention fusion | 2016-01-25 |
Learning Natural Language Inference with LSTM | ✓ Link | 86.1 | 92.0 | 1.9m | | | | 300D mLSTM word-by-word attention model | 2015-12-30 |
Learning to Compose Task-Specific Tree Structures | ✓ Link | 86.0 | 93.1 | 10m | | | | 600D Gumbel TreeLSTM encoders | 2017-07-10 |
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference | ✓ Link | 86.0 | 91.0 | 29m | | | | 600D Residual stacked encoders | 2017-08-07 |
Star-Transformer | ✓ Link | 86.0 | | | | | | Star-Transformer (no cross sentence attention) | 2019-02-25 |
Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference | | 85.9 | 87.3 | 3.7m | | | | 300D CAFE (no cross-sentence attention) | 2017-12-30 |
[]() | | 85.9 | – | – | | | | 1200D REGMAPR (Base+Reg) | |
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference | ✓ Link | 85.7 | 89.8 | 9.7m | | | | 300D Residual stacked encoders | 2017-08-07 |
Long Short-Term Memory-Networks for Machine Reading | ✓ Link | 85.7 | 87.3 | 1.7m | | | | 300D LSTMN with deep attention fusion | 2016-01-25 |
Learning to Compose Task-Specific Tree Structures | ✓ Link | 85.6 | 91.2 | 2.9m | | | | 300D Gumbel TreeLSTM encoders | 2017-07-10 |
DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding | ✓ Link | 85.6 | 91.1 | 2.4m | | | | 300D Directional self-attention network encoders | 2017-09-14 |
Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference | ✓ Link | 85.5 | 90.5 | 12m | | | | 600D (300+300) Deep Gated Attn. BiLSTM encoders | 2017-08-04 |
Neural Semantic Encoders | ✓ Link | 85.4 | 86.9 | 3.2m | | | | 300D MMA-NSE encoders with attention | 2016-07-14 |
Modelling Interaction of Sentence Pair with coupled-LSTMs | | 85.1 | 86.7 | 190k | | | | 50D stacked TC-LSTMs | 2016-05-18 |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention | ✓ Link | 85.0 | 85.9 | 2.8m | | | | 600D (300+300) BiLSTM encoders with intra-attention and symbolic preproc. | 2016-05-30 |
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News | ✓ Link | 84.8 | | | | | | Stacked Bi-LSTMs (shortcut connections, max-pooling) | 2018-11-02 |
Neural Semantic Encoders | ✓ Link | 84.6 | 86.2 | 3.0m | | | | 300D NSE encoders | 2016-07-14 |
Deep Fusion LSTMs for Text Semantic Matching | | 84.6 | 85.2 | 320k | | | | 100D DF-LSTM | 2016-08-01 |
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data | ✓ Link | 84.5 | 85.6 | 40m | | | | 4096D BiLSTM with max-pooling | 2017-05-05 |
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News | ✓ Link | 84.5 | | | | | | Bi-LSTM sentence encoder (max-pooling) | 2018-11-02 |
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News | ✓ Link | 84.4 | | | | | | Stacked Bi-LSTMs (shortcut connections, max-pooling, attention) | 2018-11-02 |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention | ✓ Link | 84.2 | 84.5 | 2.8m | | | | 600D (300+300) BiLSTM encoders with intra-attention | 2016-05-30 |
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms | ✓ Link | 83.8 | | | | | | SWEM-max | 2018-05-24 |
Reasoning about Entailment with Neural Attention | ✓ Link | 83.5 | 85.3 | 250k | | | | 100D LSTMs w/ word-by-word attention | 2015-09-22 |
Neural Tree Indexers for Text Understanding | ✓ Link | 83.4 | 82.5 | 4.0m | | | | 300D NTI-SLSTM-LSTM encoders | 2016-07-15 |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention | ✓ Link | 83.3 | 86.4 | 2.0m | | | | 600D (300+300) BiLSTM encoders | 2016-05-30 |
A Fast Unified Model for Parsing and Sentence Understanding | ✓ Link | 83.2 | 89.2 | 3.7m | | | | 300D SPINN-PI encoders | 2016-03-19 |
Natural Language Inference by Tree-Based Convolution and Heuristic Matching | | 82.1 | 83.3 | 3.5m | | | | 300D Tree-based CNN encoders | 2015-12-28 |
Order-Embeddings of Images and Language | ✓ Link | 81.4 | 98.8 | 15m | | | | 1024D GRU encoders w/ unsupervised 'skip-thoughts' pre-training | 2015-11-19 |
DELTA: A DEep learning based Language Technology plAtform | ✓ Link | 80.7 | | | | | | DELTA (LSTM) | 2019-08-02 |
A Fast Unified Model for Parsing and Sentence Understanding | ✓ Link | 80.6 | 83.9 | 3.0m | | | | 300D LSTM encoders | 2016-03-19 |
A large annotated corpus for learning natural language inference | ✓ Link | 78.2 | 99.7 | | | | | + Unigram and bigram features | 2015-08-21 |
A large annotated corpus for learning natural language inference | ✓ Link | 77.6 | 84.8 | 220k | | | | 100D LSTM encoders | 2015-08-21 |
A large annotated corpus for learning natural language inference | ✓ Link | 50.4 | 49.4 | | | | | Unlexicalized features | 2015-08-21 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | 91.6 | | | MT-DNN-SMART_100%ofTrainingData | 2019-11-08 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | 88.7 | | | MT-DNN-SMART_10%ofTrainingData | 2019-11-08 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | 86 | | | MT-DNN-SMART_1%ofTrainingData | 2019-11-08 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | 82.7 | | | MT-DNN-SMART_0.1%ofTrainingData | 2019-11-08 |
SplitEE: Early Exit in Deep Neural Networks with Split Computing | ✓ Link | | | | | | 79.0 | SplitEE-S | 2023-09-17 |