PaLM: Scaling Language Modeling with Pathways | ✓ Link | 100 | 100 | PaLM 540B (finetuned) | 2022-04-05 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 99.2 | 98.6 | Vega v2 6B (KD-based prompt transfer) | 2022-12-04 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 98.2 | | ST-MoE-L 4.1B (fine-tuned) | 2022-02-17 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 98 | | ST-MoE-32B 269B (fine-tuned) | 2022-02-17 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 97.6 | 95.9 | Turing NLR v5 XXL 5.4B (fine-tuned) | 2022-12-04 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 97.2 | 94.9 | DeBERTa-1.5B | 2020-06-05 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 96.8 | 93.9 | T5-XXL 11B (fine-tuned) | 2019-10-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 94.4 | 90.3 | T5-Large 770M (fine-tuned) | 2019-10-23 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 94 | 86.2 | T5-Base 220M (fine-tuned) | 2019-10-23 |
PaLM 2 Technical Report | ✓ Link | 87.5 | | PaLM 2-L (one-shot) | 2023-05-17 |
PaLM 2 Technical Report | ✓ Link | 82.1 | | PaLM 2-S (one-shot) | 2023-05-17 |
PaLM 2 Technical Report | ✓ Link | 80.4 | | PaLM 2-M (one-shot) | 2023-05-17 |
Language Models are Few-Shot Learners | ✓ Link | 75.6 | | GPT-3 175B (Few-Shot) | 2020-05-28 |
N-Grammer: Augmenting Transformers with latent n-grams | ✓ Link | 67.9 | 59.7 | N-Grammer 343M | 2022-07-13 |
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | ✓ Link | 67.9 | | AlexaTM 20B | 2022-08-02 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 53.57 | | Bloomberg GPT (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 48.21 | | GPT-NeoX (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 48.21 | | BLOOM 176B (one-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 44.64 | | OPT 66B (one-shot) | 2023-03-30 |
Language Models are Few-Shot Learners | ✓ Link | | 52 | GPT-3 175B (few-shot, k=32) | 2020-05-28 |