SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | 90.7 | | | | | | | ALICE | 2019-11-08 |
Entailment as Few-Shot Learner | ✓ Link | 89.2 | | | | | | | RoBERTa-large 355M + Entailment as Few-shot Learner | 2021-04-29 |
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | ✓ Link | 88.5 | 91.4 | | | | | | Charformer-Tall | 2021-06-23 |
RealFormer: Transformer Likes Residual Attention | ✓ Link | 88.28 | 91.34 | | | | | | RealFormer | 2020-12-21 |
FNet: Mixing Tokens with Fourier Transforms | ✓ Link | 85 | | | | | | | FNet-Large | 2021-05-09 |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | | 74.4 | 90.7 | | | | | | StructBERTRoBERTa ensemble | 2019-08-13 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 74.2 | 90.3 | | | | | | XLNet-Large (ensemble) | 2019-06-19 |
Adversarial Self-Attention for Language Understanding | ✓ Link | 73.7 | | | | | | | ASA + RoBERTa | 2022-06-25 |
Training Complex Models with Multi-Task Weak Supervision | ✓ Link | 73.1 | 89.9 | | | | | | Snorkel MeTaL(ensemble) | 2018-10-05 |
Multi-Task Deep Neural Networks for Natural Language Understanding | ✓ Link | 72.4 | 89.6 | | | | | | MT-DNN | 2019-01-31 |
Adversarial Self-Attention for Language Understanding | ✓ Link | 72.3 | | | | | | | ASA + BERT-base | 2022-06-25 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 72.1 | | | | | | | BERT-LARGE | 2018-10-11 |
SpanBERT: Improving Pre-training by Representing and Predicting Spans | ✓ Link | 71.9 | 89.5 | | | | | | SpanBERT | 2019-07-24 |
TinyBERT: Distilling BERT for Natural Language Understanding | ✓ Link | 71.3 | | | | | | | TinyBERT | 2019-09-23 |
ERNIE: Enhanced Language Representation with Informative Entities | ✓ Link | 71.2 | | | | | | | ERNIE | 2019-05-17 |
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language | ✓ Link | | 92.4 | | | | | | data2vec | 2022-02-07 |
What Do Questions Exactly Ask? MFAE: Duplicate Question Identification with Multi-Fusion Asking Emphasis | ✓ Link | | 90.54 | | | | | | MFAE | 2020-05-07 |
Simple and Effective Text Matching with Richer Alignment Features | ✓ Link | | 89.2 | | | | | | RE2 | 2019-08-01 |
Multiway Attention Networks for Modeling Sentence Pairs | ✓ Link | | 89.12 | | | | | | MwAN | 2018-07-01 |
Natural Language Inference over Interaction Space | ✓ Link | | 89.06 | | | | | | DIIN | 2017-09-13 |
Multi-task Sentence Encoding Model for Semantic Retrieval in Question Answering Systems | | | 88.86 | | | | | | MSEM | 2019-11-18 |
Cell-aware Stacked LSTMs for Modeling Sentences | | | 88.6 | | | | | | Bi-CAS-LSTM | 2018-09-07 |
Neural Paraphrase Identification of Questions with Noisy Pretraining | | | 88.40 | | | | | | pt-DecAtt | 2017-04-15 |
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding | | | 88.28 | | | | | | TRANS-BLSTM | 2020-03-16 |
Bilateral Multi-Perspective Matching for Natural Language Sentences | ✓ Link | | 88.17 | | | | | | BiMPM | 2017-02-13 |
Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning | ✓ Link | | 87.01 | | | | | | GenSen | 2018-03-30 |
Self-Explaining Structures Improve NLP Models | ✓ Link | | 80 | | | | | | 1-3[0.8pt/2pt] Random | 2020-12-03 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | 74.8 | | | 92.6 | | | FreeLB | 2019-11-08 |
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning | ✓ Link | | | 9295 | 8030 | | | | BERT-Base | 2020-12-22 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | | | | 91.5 | | 88.5 | SMART-BERT | 2019-11-08 |
SplitEE: Early Exit in Deep Neural Networks with Split Computing | ✓ Link | | | | | | 71.1 | | SplitEE-S | 2023-09-17 |