PaLM: Scaling Language Modeling with Pathways | ✓ Link | 89.7 | | PaLM-540B (Few-Shot) | 2022-04-05 |
PaLM 2 Technical Report | ✓ Link | 86.9 | | PaLM 2-L (one-shot) | 2023-05-17 |
Language Models are Few-Shot Learners | ✓ Link | 86.4 | 1.92 | GPT-3 175B (Few-Shot) | 2020-05-28 |
Stay on topic with Classifier-Free Guidance | | 84.0 | | LLaMA-65B+CFG (Zero-Shot) | 2023-06-30 |
Stay on topic with Classifier-Free Guidance | | 83.9 | | LLaMA-30B+CFG (zero-shot) | 2023-06-30 |
PaLM 2 Technical Report | ✓ Link | 83.7 | | PaLM 2-M (one-shot) | 2023-05-17 |
[]() | | 82.33 | | Cohere Large | |
Stay on topic with Classifier-Free Guidance | | 82.2 | | LLaMA-13B+CFG (zero-shot) | 2023-06-30 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 81.8 | | PaLM-540B (One-Shot) | 2022-04-05 |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | | 80.9 | | GLaM 62B/64E (One-Shot) | 2021-12-13 |
PaLM 2 Technical Report | ✓ Link | 80.7 | | PaLM 2-S (one-shot) | 2023-05-17 |
GLM-130B: An Open Bilingual Pre-trained Model | ✓ Link | 80.2 | | GLM-130B (bidirectional attention) | 2022-10-05 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 79.47 | | SparseGPT (175B, 2:4 Sparsity) | 2023-01-02 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 78.77 | | SparseGPT (175B, 4:8 Sparsity) | 2023-01-02 |
PaLM: Scaling Language Modeling with Pathways | ✓ Link | 77.9 | | PaLM-540B (Zero-Shot) | 2022-04-05 |
Training Compute-Optimal Large Language Models | ✓ Link | 77.7 | | Chinchilla (Zero-Shot) | 2022-03-29 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 76.51 | | SparseGPT (175B, 50% Sparsity) | 2023-01-02 |
Language Models are Few-Shot Learners | ✓ Link | 76.2 | 3.00 | GPT-3 175B (Zero-Shot) | 2020-05-28 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 75.59 | | OPT-175B | 2023-01-02 |
Language Models are Few-Shot Learners | ✓ Link | 72.5 | 3.56 | GPT-3 13B (Zero-Shot) | 2020-05-28 |
GLM: General Language Model Pretraining with Autoregressive Blank Infilling | ✓ Link | 72.35 | | GLM-XXLarge (bidirectional) | 2021-03-18 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 70.46 | | Pythia 12B (0-shot) | 2023-04-03 |
Language Models are Few-Shot Learners | ✓ Link | 70.3 | 4.00 | GPT-3 6.7B (Zero-Shot) | 2020-05-28 |
[]() | | 69.7 | 3.99 | GPT-J-6B | |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces | ✓ Link | 69.2 | 4.23 | Mamba-2.8B | 2023-12-01 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | 67.28 | | Pythia 6.9B (0-shot) | 2023-04-03 |
GLM: General Language Model Pretraining with Autoregressive Blank Infilling | ✓ Link | 67.18 | | GLM-XXLarge (unidirectional) | 2021-03-18 |
Language Models are Few-Shot Learners | ✓ Link | 67.1 | 4.60 | GPT-3 2.7B (Zero-Shot) | 2020-05-28 |
Language Models are Unsupervised Multitask Learners | ✓ Link | 63.24 | 8.63 | GPT-2 1.5B (Zero Shot) | 2019-02-14 |
Universal Transformers | ✓ Link | 56.25 | | Universal Transformer (w/ dynamic halting) | 2018-07-10 |
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences | ✓ Link | 54.34 | | Residual Shuffle-Exchange network | 2020-04-06 |
Broad Context Language Modeling as Reading Comprehension | | 49.0 | | Gated-Attention Reader (+ features) | 2016-10-26 |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | ✓ Link | 0.02 | | OPT-175B (50% Sparsity) | 2023-01-02 |
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts | ✓ Link | 0.01 | | test | 2019-09-29 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | | 3.92 | Pythia 12B(Zero-Shot) | 2023-04-03 |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | ✓ Link | | 4.45 | Pythia 6.9B(Zero-Shot) | 2023-04-03 |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | ✓ Link | Megatron-Turing NLG 530B (Few-Shot) | | Megatron-Turing NLG 530B (Few-Shot) | 2022-01-28 |