| GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data | | 87 | GPT-4 o1(300b) | 2024-10-03 |
| Llama 3 Meets MoE: Efficient Upcycling | ✓ Link | 86.6 | Llama 3.1 (405B) | 2024-12-13 |
| Llama 3 Meets MoE: Efficient Upcycling | ✓ Link | 86.0 | Llama 3.1 (70B) | 2024-12-13 |
| []() | | 83.7 | Gemini Ultra (5-shot) | |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | | 79 | Claude 3 Sonnet (5-shot) | 2024-03-04 |
| []() | | 77.5 | Qwen1.5 72B (5-shot) | |
| The Claude 3 Model Family: Opus, Sonnet, Haiku | | 75.2 | Claude 3 Haiku (5-shot) | 2024-03-04 |
| The Llama 3 Herd of Models | ✓ Link | 73.7 | DBRX Instruct 132B (5-shot) | 2024-07-31 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 73.5 | llama 2(65b) | 2022-10-20 |
| The Llama 3 Herd of Models | ✓ Link | 73.0 | Llama 3.1 8B (CoT) | 2024-07-31 |
| Mixtral of Experts | ✓ Link | 70.6 | Mixtral 8x7B (5-shot) | 2024-01-08 |
| GPT-4 Technical Report | ✓ Link | 70.0 | GPT-3.5 Turbo | 2023-03-15 |
| LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 68.9 | LLaMA 65B (fine-tuned) | 2023-02-27 |
| Training Compute-Optimal Large Language Models | ✓ Link | 67.5 | chatgpt/gpt3.5(20B) | 2022-03-29 |
| LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 63.4 | LLaMA 65B (5-shot) | 2023-02-27 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 62.6 | LLaMA 2 34B (5-shot) | 2023-07-18 |
| Mixtral of Experts | ✓ Link | 62.5 | Mistral 7B (5-shot) | 2024-01-08 |
| Mistral 7B | ✓ Link | 60.1 | Mistral 7B (5-shot) | 2023-10-10 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 59.5 | GPT-3 Davinci 175B (CoT) | 2022-10-20 |
| LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 57.8 | LLaMA 33B (5-shot) | 2023-02-27 |
| The Falcon Series of Open Language Models | | 57.0 | Falcon 40B | 2023-11-28 |
| []() | | 56.7 | Qwen 7B (5-shot) | |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 54.8 | LLaMA 2 13B (5-shot) | 2023-07-18 |
| Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | ✓ Link | 53.2 | Branch-Train-MiX 4x7B (sampling top-1 experts) | 2024-03-12 |
| Galactica: A Large Language Model for Science | ✓ Link | 52.6 | GAL 120B (zero-shot) | 2022-11-16 |
| Atlas: Few-shot Learning with Retrieval Augmented Language Models | ✓ Link | 47.9 | Atlas (5-shot) | 2022-08-05 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 45.5 | Flan-T5-XL 3B (CoT) | 2022-10-20 |
| Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 45.3 | LLaMA 2 7B (5-shot) | 2023-07-18 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 45.1 | Flan-T5-Large 780M | 2022-10-20 |
| GLM-130B: An Open Bilingual Pre-trained Model | ✓ Link | 44.8 | GLM-130B | 2022-10-05 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 40.5 | Flan-T5-Large 780M (CoT) | 2022-10-20 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 39.7 | GPT-3 Davinci 175B (5-shot) | 2022-10-20 |
| BloombergGPT: A Large Language Model for Finance | ✓ Link | 39.2 | Bloomberg GPT 50B (5-shot) | 2023-03-30 |
| UL2: Unifying Language Learning Paradigms | ✓ Link | 39.2 | UL2 20B (5-shot) | 2022-05-10 |
| BloombergGPT: A Large Language Model for Finance | ✓ Link | 39.1 | BLOOM 176B (5-shot) | 2023-03-30 |
| Textbooks Are All You Need II: phi-1.5 technical report | ✓ Link | 37.9 | phi-1.5-web 1.3B | 2023-09-11 |
| BloombergGPT: A Large Language Model for Finance | ✓ Link | 36 | OPT 66B (5-shot) | 2023-03-30 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 35.9 | Flan-T5-Base 250M | 2022-10-20 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 33.7 | Flan-T5-Base 250M (CoT) | 2022-10-20 |
| GPT-NeoX-20B: An Open-Source Autoregressive Language Model | ✓ Link | 33.6 | GPT-NeoX 20B (5-shot) | 2022-04-14 |
| []() | | 31 | RWKV v5 Eagle 7B | |
| MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models | ✓ Link | 29.68 | LLaMA7B-MiLe-Loss(5-shot) | 2023-10-30 |
| Scaling Instruction-Finetuned Language Models | ✓ Link | 28.7 | Flan-T5-Small 80M | 2022-10-20 |
| The Falcon Series of Open Language Models | | 28.0 | Falcon 7B (5-shot) | 2023-11-28 |