OpenCodePapers

multi-task-language-understanding-on-bbh-alg

Multi-Task LearningMulti-task Language Understanding
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAverage (%)ModelNameReleaseDate
Evaluating Large Language Models Trained on Code✓ Link73.9code-davinci-002 175B (CoT)2021-07-07
Scaling Instruction-Finetuned Language Models✓ Link66.5Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link62.2PaLM 540B (CoT + self-consistency)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link61.3Flan-PaLM 540B (3-shot, fine-tuned, CoT)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link57.6PaLM 540B (CoT)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link48.2Flan-PaLM 540B (3-shot, fine-tuned)2022-10-20
Scaling Instruction-Finetuned Language Models✓ Link38.3PaLM 540B2022-10-20