Paper | Code | Average (%) | ModelName | ReleaseDate |
---|---|---|---|---|
Evaluating Large Language Models Trained on Code | ✓ Link | 73.9 | code-davinci-002 175B (CoT) | 2021-07-07 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 66.5 | Flan-PaLM 540B (3-shot, fine-tuned, CoT + SC) | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 62.2 | PaLM 540B (CoT + self-consistency) | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 61.3 | Flan-PaLM 540B (3-shot, fine-tuned, CoT) | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 57.6 | PaLM 540B (CoT) | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 48.2 | Flan-PaLM 540B (3-shot, fine-tuned) | 2022-10-20 |
Scaling Instruction-Finetuned Language Models | ✓ Link | 38.3 | PaLM 540B | 2022-10-20 |