OpenCodePapers

multi-task-language-understanding-on-mmlu

Multi-Task LearningMulti-task Language Understanding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAverage (%)ModelNameReleaseDate
GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data87GPT-4 o1(300b)2024-10-03
Llama 3 Meets MoE: Efficient Upcycling✓ Link86.6Llama 3.1 (405B)2024-12-13
Llama 3 Meets MoE: Efficient Upcycling✓ Link86.0Llama 3.1 (70B)2024-12-13
[]()83.7Gemini Ultra (5-shot)
The Claude 3 Model Family: Opus, Sonnet, Haiku79Claude 3 Sonnet (5-shot)2024-03-04
[]()77.5Qwen1.5 72B (5-shot)
The Claude 3 Model Family: Opus, Sonnet, Haiku75.2Claude 3 Haiku (5-shot)2024-03-04
The Llama 3 Herd of Models✓ Link73.7DBRX Instruct 132B (5-shot)2024-07-31
Scaling Instruction-Finetuned Language Models✓ Link73.5llama 2(65b)2022-10-20
The Llama 3 Herd of Models✓ Link73.0Llama 3.1 8B (CoT)2024-07-31
Mixtral of Experts✓ Link70.6Mixtral 8x7B (5-shot)2024-01-08
GPT-4 Technical Report✓ Link70.0GPT-3.5 Turbo2023-03-15
LLaMA: Open and Efficient Foundation Language Models✓ Link68.9LLaMA 65B (fine-tuned)2023-02-27
Training Compute-Optimal Large Language Models✓ Link67.5chatgpt/gpt3.5(20B)2022-03-29
LLaMA: Open and Efficient Foundation Language Models✓ Link63.4LLaMA 65B (5-shot)2023-02-27
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link62.6LLaMA 2 34B (5-shot)2023-07-18
Mixtral of Experts✓ Link62.5Mistral 7B (5-shot)2024-01-08
Mistral 7B✓ Link60.1Mistral 7B (5-shot)2023-10-10
Scaling Instruction-Finetuned Language Models✓ Link59.5GPT-3 Davinci 175B (CoT)2022-10-20
LLaMA: Open and Efficient Foundation Language Models✓ Link57.8LLaMA 33B (5-shot)2023-02-27
The Falcon Series of Open Language Models57.0Falcon 40B2023-11-28
[]()56.7Qwen 7B (5-shot)
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link54.8LLaMA 2 13B (5-shot)2023-07-18
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM✓ Link53.2Branch-Train-MiX 4x7B (sampling top-1 experts)2024-03-12
Galactica: A Large Language Model for Science✓ Link52.6GAL 120B (zero-shot)2022-11-16
Atlas: Few-shot Learning with Retrieval Augmented Language Models✓ Link47.9Atlas (5-shot)2022-08-05
Scaling Instruction-Finetuned Language Models✓ Link45.5Flan-T5-XL 3B (CoT)2022-10-20
Llama 2: Open Foundation and Fine-Tuned Chat Models✓ Link45.3LLaMA 2 7B (5-shot)2023-07-18
Scaling Instruction-Finetuned Language Models✓ Link45.1Flan-T5-Large 780M2022-10-20
GLM-130B: An Open Bilingual Pre-trained Model✓ Link44.8GLM-130B2022-10-05
Scaling Instruction-Finetuned Language Models✓ Link40.5Flan-T5-Large 780M (CoT)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link39.7GPT-3 Davinci 175B (5-shot)2022-10-20
BloombergGPT: A Large Language Model for Finance✓ Link39.2Bloomberg GPT 50B (5-shot)2023-03-30
UL2: Unifying Language Learning Paradigms✓ Link39.2UL2 20B (5-shot)2022-05-10
BloombergGPT: A Large Language Model for Finance✓ Link39.1BLOOM 176B (5-shot)2023-03-30
Textbooks Are All You Need II: phi-1.5 technical report✓ Link37.9phi-1.5-web 1.3B2023-09-11
BloombergGPT: A Large Language Model for Finance✓ Link36OPT 66B (5-shot)2023-03-30
Scaling Instruction-Finetuned Language Models✓ Link35.9Flan-T5-Base 250M2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link33.7Flan-T5-Base 250M (CoT)2022-10-20
GPT-NeoX-20B: An Open-Source Autoregressive Language Model✓ Link33.6GPT-NeoX 20B (5-shot)2022-04-14
[]()31RWKV v5 Eagle 7B
MiLe Loss: a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models✓ Link29.68LLaMA7B-MiLe-Loss(5-shot)2023-10-30
Scaling Instruction-Finetuned Language Models✓ Link28.7Flan-T5-Small 80M2022-10-20
The Falcon Series of Open Language Models28.0Falcon 7B (5-shot)2023-11-28