Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 97.5 | | | T5-11B | 2019-10-23 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | 97.5 | | | MT-DNN-SMART | 2019-11-08 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 97.4 | | | T5-3B | 2019-10-23 |
Muppet: Massive Multi-task Representations with Pre-Finetuning | ✓ Link | 97.4 | | | MUPPET Roberta Large | 2021-01-26 |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | ✓ Link | 97.1 | | | ALBERT | 2019-09-26 |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | | 97.1 | | | StructBERTRoBERTa ensemble | 2019-08-13 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 97 | | | XLNet (single model) | 2019-06-19 |
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | ✓ Link | 96.9 | | | ELECTRA | 2020-03-23 |
Entailment as Few-Shot Learner | ✓ Link | 96.9 | | | RoBERTa-large 355M + Entailment as Few-shot Learner | 2021-04-29 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 96.8 | | | XLNet-Large (ensemble) | 2019-06-19 |
Learning to Encode Position for Transformer with Continuous Dynamical Model | ✓ Link | 96.7 | | | FLOATER-large | 2020-03-13 |
Muppet: Massive Multi-task Representations with Pre-Finetuning | ✓ Link | 96.7 | | | MUPPET Roberta base | 2021-01-26 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 96.7 | | | RoBERTa (ensemble) | 2019-07-26 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 96.5 | | | DeBERTa (large) | 2020-06-05 |
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding | ✓ Link | 96.5 | | | MT-DNN-ensemble | 2019-04-20 |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | ✓ Link | 96.4 | | | RoBERTa-large 355M (MLP quantized vector-wise, fine-tuned) | 2022-08-15 |
Adversarial Self-Attention for Language Understanding | ✓ Link | 96.3 | | | ASA + RoBERTa | 2022-06-25 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 96.3 | | | T5-Large 770M | 2019-10-23 |
Training Complex Models with Multi-Task Weak Supervision | ✓ Link | 96.2 | | | Snorkel MeTaL(ensemble) | 2018-10-05 |
A Statistical Framework for Low-bitwidth Training of Deep Neural Networks | ✓ Link | 96.2 | | | PSQ (Chen et al., 2020) | 2020-10-27 |
An Algorithm for Routing Vectors in Sequences | ✓ Link | 96.0 | | | Heinsen Routing + RoBERTa-large | 2022-11-20 |
Multi-Task Deep Neural Networks for Natural Language Understanding | ✓ Link | 95.6 | | | MT-DNN | 2019-01-31 |
An Algorithm for Routing Capsules in All Domains | ✓ Link | 95.6 | | | Heinsen Routing + GPT-2 | 2019-11-02 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 95.2 | | | T5-Base | 2019-10-23 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | ✓ Link | 95 | | | ERNIE 2.0 Base | 2019-07-29 |
Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation | ✓ Link | 94.91 | | | RoBERTa+DualCL | 2022-01-21 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 94.9 | | | BERT-LARGE | 2018-10-11 |
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization | ✓ Link | 94.84 | | | RoBERTa + SubRegWeigh (K-means) | 2024-09-10 |
SpanBERT: Improving Pre-training by Representing and Predicting Spans | ✓ Link | 94.8 | | | SpanBERT | 2019-07-24 |
Pay Attention to MLPs | ✓ Link | 94.8 | | | gMLP-large | 2021-05-17 |
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT | | 94.8 | | | Q-BERT (Shen et al., 2020) | 2019-09-12 |
Q8BERT: Quantized 8Bit BERT | ✓ Link | 94.7 | | | Q8BERT (Zafrir et al., 2019) | 2019-10-14 |
Cloze-driven Pretraining of Self-attention Networks | | 94.6 | | | CNN Large | 2019-03-19 |
Big Bird: Transformers for Longer Sequences | ✓ Link | 94.6 | | | BigBird | 2020-07-28 |
CLEAR: Contrastive Learning for Sentence Representation | | 94.5 | | | MLM+ del-word+ reorder | 2020-12-31 |
Adversarial Self-Attention for Language Understanding | ✓ Link | 94.1 | | | ASA + BERT-base | 2022-06-25 |
RealFormer: Transformer Likes Residual Attention | ✓ Link | 94.04 | | | RealFormer | 2020-12-21 |
FNet: Mixing Tokens with Fourier Transforms | ✓ Link | 94 | | | FNet-Large | 2021-05-09 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | 93.6 | | | MT-DNN | 2019-11-08 |
ERNIE: Enhanced Language Representation with Informative Entities | ✓ Link | 93.5 | | | ERNIE | 2019-05-17 |
GPU Kernels for Block-Sparse Weights | ✓ Link | 93.2 | | | Block-sparse LSTM | 2017-12-01 |
LM-CPPF: Paraphrasing-Guided Data Augmentation for Contrastive Prompt-Based Few-Shot Fine-Tuning | ✓ Link | 93.2 | | | LM-CPPF RoBERTa-base | 2023-05-29 |
TinyBERT: Distilling BERT for Natural Language Understanding | ✓ Link | 93.1 | | | TinyBERT-6 67M | 2019-09-23 |
How to Train BERT with an Academic Budget | ✓ Link | 93.0 | | | 24hBERT | 2021-04-15 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | 93 | | | SMART+BERT-BASE | 2019-11-08 |
TinyBERT: Distilling BERT for Natural Language Understanding | ✓ Link | 92.6 | | | TinyBERT-4 14.5M | 2019-09-23 |
Learning to Generate Reviews and Discovering Sentiment | ✓ Link | 91.8 | | | bmLSTM | 2017-04-05 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 91.8 | | | T5-Small | 2019-10-23 |
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors | ✓ Link | 91.7 | | | byte mLSTM7 | 2018-05-14 |
Pay Attention when Required | ✓ Link | 91.6 | | | PAR BERT Base | 2020-09-09 |
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization | ✓ Link | 91.6 | | | Charformer-Base | 2021-06-23 |
SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | ✓ Link | 91.4 | | | SqueezeBERT | 2020-06-19 |
Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention | ✓ Link | 91.4 | | | Nyströmformer | 2021-02-07 |
Cell-aware Stacked LSTMs for Modeling Sentences | | 91.3 | | | Bi-CAS-LSTM | 2018-09-07 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | 91.3 | | | DistilBERT 66M | 2019-10-02 |
On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis | ✓ Link | 91.2 | | | CNN | 2017-07-06 |
Improved Sentence Modeling using Suffix Bidirectional LSTM | | 91.2 | | | Suffix BiLSTM | 2018-05-18 |
Fine-grained Sentiment Classification using BERT | ✓ Link | 91.2 | | | BERT Base | 2019-10-04 |
Practical Text Classification With Large Pre-Trained Language Models | ✓ Link | 90.9 | | | Transformer (finetune) | 2018-12-04 |
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks | ✓ Link | 90.7 | | | Single layer bilstm distilled from BERT | 2019-03-28 |
Learned in Translation: Contextualized Word Vectors | ✓ Link | 90.3 | | | BCN+Char+CoVe | 2017-08-01 |
Convolutional Neural Networks with Recurrent Neural Filters | ✓ Link | 90.0 | | | CNN-RNF-LSTM | 2018-08-28 |
Neural Semantic Encoders | ✓ Link | 89.7 | | | Neural Semantic Encoder | 2016-07-14 |
Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling | ✓ Link | 89.5 | | | BLSTM-2DCNN | 2016-11-21 |
Harnessing Deep Neural Networks with Logic Rules | ✓ Link | 89.3 | | | CNN + Logic rules | 2016-03-21 |
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing | ✓ Link | 88.6 | | | DMN [ankit16] | 2015-06-24 |
Convolutional Neural Networks for Sentence Classification | ✓ Link | 88.1 | | | CNN-multichannel [kim2013] | 2014-08-25 |
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks | ✓ Link | 88.0 | | | Consistency Tree LSTM with tuned Glove vectors [tai2015improved] | 2015-02-28 |
A C-LSTM Neural Network for Text Classification | ✓ Link | 87.8 | | | C-LSTM | 2015-11-27 |
Message Passing Attention Networks for Document Understanding | ✓ Link | 87.75 | | | MPAD-path | 2019-08-17 |
Information Aggregation via Dynamic Routing for Sequence Encoding | ✓ Link | 87.6 | | | Standard DR-AGG | 2018-06-05 |
Universal Sentence Encoder | ✓ Link | 87.21 | | | USE_T+CNN (lrn w.e.) | 2018-03-29 |
Information Aggregation via Dynamic Routing for Sequence Encoding | ✓ Link | 87.2 | | | Reverse DR-AGG | 2018-06-05 |
A Helping Hand: Transfer Learning for Deep Sentiment Analysis | | 86.99 | | | DC-MCNN | 2018-07-01 |
The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding Distillation with Ensemble Learning | ✓ Link | 86.95 | | | STM+TSED+PT+2L | 2019-05-31 |
Investigating Capsule Networks with Dynamic Routing for Text Classification | ✓ Link | 86.8 | | | Capsule-B | 2018-03-29 |
Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks | ✓ Link | 86.3 | | | 2-layer LSTM [tai2015improved] | 2015-02-28 |
Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms | ✓ Link | 84.3 | | | SWEM-concat | 2018-05-24 |
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank | ✓ Link | 82.9 | | | MV-RNN | 2013-10-01 |
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training | ✓ Link | 82.3 | | | GloVe+Emo2Vec | 2018-09-12 |
Emo2Vec: Learning Generalized Emotion Representation by Multi-task Training | ✓ Link | 81.2 | | | Emo2Vec | 2018-09-12 |
Task-oriented Word Embedding for Text Classification | ✓ Link | 78.8 | | | ToWE-CBOW | 2018-08-01 |
Exploring Joint Neural Model for Sentence Level Discourse Parsing and Sentiment Analysis | | 54.72 | | | Joined Model Multi-tasking | 2017-08-01 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | 96.9 | | SMARTRoBERTa | 2019-11-08 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | 96.1 | | SMART-MT-DNN | 2019-11-08 |
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization | ✓ Link | | 93.0 | | SMART-BERT | 2019-11-08 |
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models | ✓ Link | | | 100 | Word+ES (Scratch) | 2022-10-18 |