PaLM: Scaling Language Modeling with Pathways | ✓ Link | 90.1 | 69.2 | PaLM 540B (finetuned) | 2022-04-05 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 89.6 | | ST-MoE-32B 269B (fine-tuned) | 2022-02-17 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 88.4 | 63 | Turing NLR v5 XXL 5.4B (fine-tuned) | 2022-12-04 |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | ✓ Link | 88.2 | 63.7 | DeBERTa-1.5B | 2020-06-05 |
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE | | 88.2 | 62.4 | Vega v2 6B (fine-tuned) | 2022-12-04 |
PaLM 2 Technical Report | ✓ Link | 88.2 | | PaLM 2-L (one-shot) | 2023-05-17 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | 88.1 | | T5-XXL 11B (fine-tuned) | 2019-10-23 |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | ✓ Link | 86 | | ST-MoE-L 4.1B (fine-tuned) | 2022-02-17 |
PaLM 2 Technical Report | ✓ Link | 84.1 | | PaLM 2-M (one-shot) | 2023-05-17 |
PaLM 2 Technical Report | ✓ Link | 84.0 | | PaLM 2-S (one-shot) | 2023-05-17 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 83.4 | | FLAN 137B (prompt-tuned) | 2021-09-03 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 77.5 | | FLAN 137B (zero-shot) | 2021-09-03 |
Language Models are Few-Shot Learners | ✓ Link | 75.4 | | GPT-3 175B (Few-Shot) | 2020-05-28 |
Finetuned Language Models Are Zero-Shot Learners | ✓ Link | 72.1 | | FLAN 137B (1-shot) | 2021-09-03 |
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs | ✓ Link | 70.8 | 27.2 | KELM (finetuning BERT-large based single model) | 2021-09-09 |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | ✓ Link | 70.0 | 24.1 | BERT-large(single model) | 2018-10-11 |
Ask Me Anything: A simple strategy for prompting language models | ✓ Link | 63.8 | | Neo-6B (QA + WS) | 2022-10-05 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 62.3 | | Bloomberg GPT 50B (1-shot) | 2023-03-30 |
N-Grammer: Augmenting Transformers with latent n-grams | ✓ Link | 62 | 11.3 | N-Grammer 343M | 2022-07-13 |
Ask Me Anything: A simple strategy for prompting language models | ✓ Link | 60.8 | | Neo-6B (few-shot) | 2022-10-05 |
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model | ✓ Link | 59.6 | | AlexaTM 20B | 2022-08-02 |
Ask Me Anything: A simple strategy for prompting language models | ✓ Link | 58.8 | | Neo-6B (QA) | 2022-10-05 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 26.7 | | BLOOM 176B (1-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 22.9 | | GPT-NeoX 20B (1-shot) | 2023-03-30 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 18.8 | | OPT 66B (1-shot) | 2023-03-30 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | ✓ Link | | 63.3 | T5-11B | 2019-10-23 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | | 59.7 | Hybrid H3 355M (3-shot, logit scoring) | 2022-12-28 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | | 59.5 | Hybrid H3 355M (0-shot, logit scoring) | 2022-12-28 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | | 51.4 | Hybrid H3 125M (0-shot, logit scoring) | 2022-12-28 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | ✓ Link | | 48.9 | Hybrid H3 125M (3-shot, logit scoring) | 2022-12-28 |