GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data | | 87 | GPT-4 o1(300b) | 2024-10-03 |
Llama 3 Meets MoE: Efficient Upcycling | ✓ Link | 86.6 | Llama 3.1 (405B) | 2024-12-13 |
Llama 3 Meets MoE: Efficient Upcycling | ✓ Link | 86.0 | Llama 3.1 (70B) | 2024-12-13 |
[]() | | 83.7 | Gemini Ultra (5-shot) | |
The Claude 3 Model Family: Opus, Sonnet, Haiku | | 79 | Claude 3 Sonnet (5-shot) | 2024-03-04 |
[]() | | 77.5 | Qwen1.5 72B (5-shot) | |
The Claude 3 Model Family: Opus, Sonnet, Haiku | | 75.2 | Claude 3 Haiku (5-shot) | 2024-03-04 |
The Llama 3 Herd of Models | ✓ Link | 73.7 | DBRX Instruct 132B (5-shot) | 2024-07-31 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 73.5 | llama 2(65b) | 2022-10-20 |
The Llama 3 Herd of Models | ✓ Link | 73.0 | Llama 3.1 8B (CoT) | 2024-07-31 |
Mixtral of Experts | ✓ Link | 70.6 | Mixtral 8x7B (5-shot) | 2024-01-08 |
GPT-4 Technical Report | ✓ Link | 70.0 | GPT-3.5 Turbo | 2023-03-15 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 68.9 | LLaMA 65B (fine-tuned) | 2023-02-27 |
Training Compute-Optimal Large Language Models | ✓ Link | 67.5 | chatgpt/gpt3.5(20B) | 2022-03-29 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 63.4 | LLaMA 65B (5-shot) | 2023-02-27 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 62.6 | LLaMA 2 34B (5-shot) | 2023-07-18 |
Mixtral of Experts | ✓ Link | 62.5 | Mistral 7B (5-shot) | 2024-01-08 |
Mistral 7B | ✓ Link | 60.1 | Mistral 7B (5-shot) | 2023-10-10 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 59.5 | GPT-3 Davinci 175B (CoT) | 2022-10-20 |
LLaMA: Open and Efficient Foundation Language Models | ✓ Link | 57.8 | LLaMA 33B (5-shot) | 2023-02-27 |
The Falcon Series of Open Language Models | | 57.0 | Falcon 40B | 2023-11-28 |
[]() | | 56.7 | Qwen 7B (5-shot) | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 54.8 | LLaMA 2 13B (5-shot) | 2023-07-18 |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | ✓ Link | 53.2 | Branch-Train-MiX 4x7B (sampling top-1 experts) | 2024-03-12 |
Galactica: A Large Language Model for Science | ✓ Link | 52.6 | GAL 120B (zero-shot) | 2022-11-16 |
Atlas: Few-shot Learning with Retrieval Augmented Language Models | ✓ Link | 47.9 | Atlas (5-shot) | 2022-08-05 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 45.5 | Flan-T5-XL 3B (CoT) | 2022-10-20 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 45.3 | LLaMA 2 7B (5-shot) | 2023-07-18 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 45.1 | Flan-T5-Large 780M | 2022-10-20 |
GLM-130B: An Open Bilingual Pre-trained Model | ✓ Link | 44.8 | GLM-130B | 2022-10-05 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 40.5 | Flan-T5-Large 780M (CoT) | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 39.7 | GPT-3 Davinci 175B (5-shot) | 2022-10-20 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 39.2 | Bloomberg GPT 50B (5-shot) | 2023-03-30 |
UL2: Unifying Language Learning Paradigms | ✓ Link | 39.2 | UL2 20B (5-shot) | 2022-05-10 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 39.1 | BLOOM 176B (5-shot) | 2023-03-30 |
Textbooks Are All You Need II: phi-1.5 technical report | ✓ Link | 37.9 | phi-1.5-web 1.3B | 2023-09-11 |
BloombergGPT: A Large Language Model for Finance | ✓ Link | 36 | OPT 66B (5-shot) | 2023-03-30 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 35.9 | Flan-T5-Base 250M | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 33.7 | Flan-T5-Base 250M (CoT) | 2022-10-20 |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model | ✓ Link | 33.6 | GPT-NeoX 20B (5-shot) | 2022-04-14 |
[]() | | 31 | RWKV v5 Eagle 7B | |
MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models | ✓ Link | 29.68 | LLaMA7B-MiLe-Loss(5-shot) | 2023-10-30 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 28.7 | Flan-T5-Small 80M | 2022-10-20 |
The Falcon Series of Open Language Models | | 28.0 | Falcon 7B (5-shot) | 2023-11-28 |