OpenCodePapers

language-modelling-on-the-pile

Language Modelling
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBits per byteTest perplexityModelNameReleaseDate
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.557Test-Time Fine-Tuning with SIFT + Llama-3.2 (3B)2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.595Test-Time Fine-Tuning with SIFT + Phi-3 (3.8B)2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.606Test-Time Fine-Tuning with SIFT + Llama-3.2 (1B)2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.629Gemma-2 27B2024-10-10
GLM-130B: An Open Bilingual Pre-trained Model✓ Link0.634GLM-130B2022-10-05
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.640Llama-3.2 3B2024-10-10
GLM-130B: An Open Bilingual Pre-trained Model✓ Link0.65Jurassic-12022-10-05
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.651Phi-3 14B2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.670Gemma-2 9B2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.678Phi-3 7B2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.679Phi-3 3.8B2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.697Llama-3.2 1B2024-10-10
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link0.7177GPT-3 Davinci 175B (pre-trained)2020-12-31
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.721Gemma-2 2B2024-10-10
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.737Llama-3.2-Instruct 3B2024-10-10
GLM-130B: An Open Bilingual Pre-trained Model✓ Link0.742GPT-32022-10-05
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.762Test-Time Fine-Tuning with SIFT + GPT-2 (774M)2024-10-10
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link0.7980GPT-3 Curie 6.7B (pre-trained)2020-12-31
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.807Llama-3.2-Instruct 1B2024-10-10
Test-Time Training on Nearest Neighbors for Large Language Models✓ Link0.85GPT-2 Large 774M (test-time training on nearest neighbors)2023-05-29
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs✓ Link0.862Test-Time Fine-Tuning with SIFT + GPT-2 (124M)2024-10-10
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link0.8718GPT-3 Babbage 1.3B (pre-trained)2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link0.9631GPT-3 Ada 350M (pre-trained)2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link1.0468GPT-2 XL 1.5B (pre-trained)2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link1.0828GPT-2 Large 774M (pre-trained)2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link1.0928GPT-2 Medium 355M (pre-trained)2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling✓ Link1.2253GPT-2 Small 124M (pre-trained)2020-12-31
Need a Small Specialized Language Model? Plan Early!10Larger Transformer 771M (fine-tuned)2024-02-02
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link10.2Hybrid H3 125M2022-12-28
Knowledge Unlearning for Mitigating Privacy Risks in Language Models✓ Link10.44GPT-Neo 2.7B2022-10-04
Hungry Hungry Hippos: Towards Language Modeling with State Space Models✓ Link10.7Transformer 125M2022-12-28
Knowledge Unlearning for Mitigating Privacy Risks in Language Models✓ Link11.46GPT-Neo 1.3B2022-10-04
Need a Small Specialized Language Model? Plan Early!12Smaller Transformer 126M (fine-tuned)2024-02-02
Knowledge Unlearning for Mitigating Privacy Risks in Language Models✓ Link17.81OPT 2.7B2022-10-04
Knowledge Unlearning for Mitigating Privacy Risks in Language Models✓ Link17.83GPT-Neo 125M2022-10-04
Knowledge Unlearning for Mitigating Privacy Risks in Language Models✓ Link19.55OPT 1.3B2022-10-04
Need a Small Specialized Language Model? Plan Early!28.1Larger Transformer 771M (pre-trained)2024-02-02
Knowledge Unlearning for Mitigating Privacy Risks in Language Models✓ Link32.26OPT 125M2022-10-04
Need a Small Specialized Language Model? Plan Early!33Smaller Transformer 126M (pre-trained)2024-02-02