Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach | ✓ Link | 85.3 | COSINE + Transductive Learning | 2020-10-15 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 78.8 | PaLM 540B (finetuned) | 2022-04-05 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 77.7 | ST-MoE-32B 269B (fine-tuned) | 2022-02-17 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 77.5 | DeBERTa-Ensemble | 2020-06-05 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 77.4 | Vega v2 6B (fine-tuned) | 2022-12-04 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 77.3 | UL2 20B (fine-tuned) | 2022-05-10 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 77.1 | Turing NLR v5 XXL 5.4B (fine-tuned) | 2022-12-04 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 76.9 | T5-XXL 11B | 2019-10-23 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 76.4 | DeBERTa-1.5B | 2020-06-05 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 74 | ST-MoE-L 4.1B (fine-tuned) | 2022-02-17 |
SenseBERT: Driving Some Sense into BERT | | 72.1 | SenseBERT-large 340M | 2019-08-15 |
SenseBERT: Driving Some Sense into BERT | | 70.3 | SenseBERT-base 110M | 2019-08-15 |
PaLM 2 Technical Report | ✓ Link | 66.8 | PaLM 2-L (one-shot) | 2023-05-17 |
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | | 65.5 | BERT-large 340M | 2018-08-28 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 64.7 | FLAN-T5-Large 783M | 2023-04-27 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 63.8 | LaMini-F-T5 783M | 2023-04-27 |
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | | 59.3 | Context2vec | 2018-08-28 |
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | | 58.7 | DeConf | 2018-08-28 |
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | | 58.1 | SW2V | 2018-08-28 |
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | | 57.7 | ElMo | 2018-08-28 |
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning | ✓ Link | 56.7 | T0-3B (CoT fine-tuned) | 2023-05-23 |
N-Grammer: Augmenting Transformers with latent n-grams | ✓ Link | 56.1 | N-Grammer 343M | 2022-07-13 |
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | ✓ Link | 53.3 | AlexaTM 20B | 2022-08-02 |
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations | | 53.1 | Sentence LSTM | 2018-08-28 |
Exploring the Benefits of Training Expert Language Models over Instruction Tuning | ✓ Link | 52.97 | RoE-3B | 2023-02-07 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 52.4 | LaMini-GPT 1.5B | 2023-04-27 |
Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models | | 52.40 | KiC-770M | 2022-10-28 |
PaLM 2 Technical Report | ✓ Link | 52.0 | PaLM 2-M (one-shot) | 2023-05-17 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 51.4 | Hybrid H3 125M (0-shot, logit scoring) | 2022-12-28 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 51.4 | Hybrid H3 125M (0-shot, rank classification) | 2022-12-28 |
PaLM 2 Technical Report | ✓ Link | 50.6 | PaLM 2-S (one-shot) | 2023-05-17 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 50.5 | LaMini-T5 738M | 2023-04-27 |
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners | ✓ Link | 50.42 | Flipped-3B | 2022-10-06 |
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions | ✓ Link | 49.8 | GPT-2-XL 1.5B | 2023-04-27 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 49.8 | UL2 20B (0-shot) | 2022-05-10 |
Language Models are Few-Shot Learners | ✓ Link | 49.4 | GPT-3 175B (few-shot, k=32) | 2020-05-28 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | 49.1 | Hybrid H3 125M (3-shot, logit scoring) | 2022-12-28 |