OpenCodePapers

multi-task-language-understanding-on-bbh-nlp

Multi-Task LearningMulti-task Language Understanding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAverage (%)ModelNameReleaseDate
[]()86.3Qwen2.5-72B
[]()86.1Jiutian-大模型
[]()85.9LLama-3-405B
[]()84.07Jiutian-57B
[]()82.4Qwen2-72B
[]()81.0LLama-3-70B
Scaling Instruction-Finetuned Language Models✓ Link78.4Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link78.2PaLM 540B (CoT + self-consistency)2022-10-20
Evaluating Large Language Models Trained on Code✓ Link73.5code-davinci-002 175B (CoT)2021-07-07
Scaling Instruction-Finetuned Language Models✓ Link72.4Flan-PaLM 540B (3-shot, fine-tuned, CoT)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link71.2PaLM 540B (CoT)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link70.0Flan-PaLM 540B (5-shot, finetuned)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link62.7PaLM 540B2022-10-20
Orca 2: Teaching Small Language Models How to Reason50.18Orca 2-13B2023-11-18
Orca 2: Teaching Small Language Models How to Reason45.93Orca 2-7B2023-11-18