[]() | | 95.9 | Turing NLR v5 XXL 5.4B (fine-tuned) | |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 94.5 | DeBERTa | 2020-06-05 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 93.2 | T5-XXL 11B | 2019-10-23 |
XLNet: Generalized Autoregressive Pretraining for Language Understanding | ✓ Link | 92.5 | XLNet | 2019-06-19 |
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | ✓ Link | 91.8 | ALBERT | 2019-09-26 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 89.7 | T5-XL 3B | 2019-10-23 |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | | 89.7 | StructBERTRoBERTa ensemble | 2019-08-13 |
A Hybrid Neural Network Model for Commonsense Reasoning | ✓ Link | 89 | HNNensemble | 2019-07-27 |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | ✓ Link | 89 | RoBERTa (ensemble) | 2019-07-26 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 85.6 | T5-Large 770M | 2019-10-23 |
A Hybrid Neural Network Model for Commonsense Reasoning | ✓ Link | 83.6 | HNN | 2019-07-27 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 78.8 | T5-Base 220M | 2019-10-23 |
A Surprisingly Robust Trick for Winograd Schema Challenge | ✓ Link | 74.7 | BERTwiki 340M (fine-tuned on WSCR) | 2019-05-15 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 74.6 | FLAN 137B (zero-shot) | 2021-09-03 |
A Surprisingly Robust Trick for Winograd Schema Challenge | ✓ Link | 71.9 | BERT-large 340M (fine-tuned on WSCR) | 2019-05-15 |
A Surprisingly Robust Trick for Winograd Schema Challenge | ✓ Link | 70.5 | BERT-base 110M (fine-tuned on WSCR) | 2019-05-15 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 70.4 | FLAN 137B (few-shot, k=4) | 2021-09-03 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 69.2 | T5-Small 60M | 2019-10-23 |
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding | ✓ Link | 67.8 | ERNIE 2.0 Large | 2019-07-29 |
SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | ✓ Link | 65.1 | SqueezeBERT | 2020-06-19 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 65.1 | BERT-large 340M | 2018-10-11 |
RWKV: Reinventing RNNs for the Transformer Era | ✓ Link | 49.3 | RWKV-4-Raven-14B | 2023-05-22 |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | ✓ Link | 44.4 | DistilBERT 66M | 2019-10-02 |