OpenCodePapers

language-modelling-on-lambada

Language Modelling
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyPerplexityModelNameReleaseDate
PaLM: Scaling Language Modeling with Pathways✓ Link89.7PaLM-540B (Few-Shot)2022-04-05
PaLM 2 Technical Report✓ Link86.9PaLM 2-L (one-shot)2023-05-17
Language Models are Few-Shot Learners✓ Link86.41.92GPT-3 175B (Few-Shot)2020-05-28
Stay on topic with Classifier-Free Guidance84.0LLaMA-65B+CFG (Zero-Shot)2023-06-30
Stay on topic with Classifier-Free Guidance83.9LLaMA-30B+CFG (zero-shot)2023-06-30
PaLM 2 Technical Report✓ Link83.7PaLM 2-M (one-shot)2023-05-17
[]()82.33Cohere Large
Stay on topic with Classifier-Free Guidance82.2LLaMA-13B+CFG (zero-shot)2023-06-30
PaLM: Scaling Language Modeling with Pathways✓ Link81.8PaLM-540B (One-Shot)2022-04-05
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts80.9GLaM 62B/64E (One-Shot)2021-12-13
PaLM 2 Technical Report✓ Link80.7PaLM 2-S (one-shot)2023-05-17
GLM-130B: An Open Bilingual Pre-trained Model✓ Link80.2GLM-130B (bidirectional attention)2022-10-05
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link79.47SparseGPT (175B, 2:4 Sparsity)2023-01-02
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link78.77SparseGPT (175B, 4:8 Sparsity)2023-01-02
PaLM: Scaling Language Modeling with Pathways✓ Link77.9PaLM-540B (Zero-Shot)2022-04-05
Training Compute-Optimal Large Language Models✓ Link77.7Chinchilla (Zero-Shot)2022-03-29
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link76.51SparseGPT (175B, 50% Sparsity)2023-01-02
Language Models are Few-Shot Learners✓ Link76.23.00GPT-3 175B (Zero-Shot)2020-05-28
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link75.59OPT-175B2023-01-02
Language Models are Few-Shot Learners✓ Link72.53.56GPT-3 13B (Zero-Shot)2020-05-28
GLM: General Language Model Pretraining with Autoregressive Blank Infilling✓ Link72.35GLM-XXLarge (bidirectional)2021-03-18
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link70.46Pythia 12B (0-shot)2023-04-03
Language Models are Few-Shot Learners✓ Link70.34.00GPT-3 6.7B (Zero-Shot)2020-05-28
[]()69.73.99GPT-J-6B
Mamba: Linear-Time Sequence Modeling with Selective State Spaces✓ Link69.24.23Mamba-2.8B2023-12-01
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link67.28Pythia 6.9B (0-shot)2023-04-03
GLM: General Language Model Pretraining with Autoregressive Blank Infilling✓ Link67.18GLM-XXLarge (unidirectional)2021-03-18
Language Models are Few-Shot Learners✓ Link67.14.60GPT-3 2.7B (Zero-Shot)2020-05-28
Language Models are Unsupervised Multitask Learners✓ Link63.248.63GPT-2 1.5B (Zero Shot)2019-02-14
Universal Transformers✓ Link56.25Universal Transformer (w/ dynamic halting)2018-07-10
Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences✓ Link54.34Residual Shuffle-Exchange network2020-04-06
Broad Context Language Modeling as Reading Comprehension49.0Gated-Attention Reader (+ features)2016-10-26
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot✓ Link0.02OPT-175B (50% Sparsity)2023-01-02
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts✓ Link0.01test2019-09-29
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link3.92Pythia 12B(Zero-Shot)2023-04-03
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling✓ Link4.45Pythia 6.9B(Zero-Shot)2023-04-03
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model✓ LinkMegatron-Turing NLG 530B (Few-Shot)Megatron-Turing NLG 530B (Few-Shot)2022-01-28