Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models | ✓ Link | 93.9 | | GPT-4 (Teaching-Inspired) | 2024-10-10 |
Automatic Model Selection with Large Language Models for Reasoning | ✓ Link | 93.7 | | GPT-4 (Model Selection) | 2023-05-23 |
[]() | | 92.3 | | Qwen2(CoT + Code Interpreter) | |
Progressive-Hint Prompting Improves Reasoning in Large Language Models | ✓ Link | 91.9 | | GPT-4 (PHP) | 2023-04-19 |
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | ✓ Link | 87.8 | | OpenMath-CodeLlama-70B (w/ code) | 2024-02-15 |
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | ✓ Link | 84.9 | | MathCoder-L-70B | 2023-10-05 |
Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | ✓ Link | 83.70 | | PoT_Eng (self-consistency @ 5) | 2023-06-03 |
Does ChatGPT Comprehend the Place Value in Numbers When Solving Math Word Problems? | ✓ Link | 82.50 | | CoT_Eng (self-consistency @ 5) | 2023-06-03 |
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | ✓ Link | 80.6 | | MMOS-CODE-34B(0-shot) | 2024-02-23 |
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | ✓ Link | 79.3 | | MMOS-DeepSeekMath-7B(0-shot) | 2024-02-23 |
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | ✓ Link | 76.4 | | MMOS-CODE-7B(0-shot) | 2024-02-23 |
Llama 2: Open Foundation and Fine-Tuned Chat Models | ✓ Link | 69.2 | | LLaMA 2-Chat | 2023-07-18 |
Math Word Problem Solving by Generating Linguistic Variants of Problem Statements | ✓ Link | 63.5 | 63.5 | DeBERTa | 2023-06-24 |
Large Language Models are Zero-Shot Reasoners | ✓ Link | 62.1 | | PaLM (zero-shot, CoT) | 2022-05-24 |
Large Language Models are Zero-Shot Reasoners | ✓ Link | 58.8 | | PaLM (zero-shot) | 2022-05-24 |
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | ✓ Link | 56.65 | | SYRELM (Vicuna 13B) | 2023-12-09 |
ATHENA: Mathematical Reasoning with Thought Expansion | ✓ Link | 54.8 | | ATHENA (roberta-large) | 2023-11-02 |
Learning Multi-Step Reasoning by Solving Arithmetic Tasks | ✓ Link | 48.9 | | MsAT-DeductReasoner | 2023-06-02 |
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction | ✓ Link | 47.3 | | Roberta-DeductReasoner | 2022-03-19 |
ATHENA: Mathematical Reasoning with Thought Expansion | ✓ Link | 45.6 | | ATHENA (roberta-base) | 2023-11-02 |
Are NLP Models really able to Solve Simple Math Word Problems? | ✓ Link | 43.8 | 43.8 | Graph2Tree with RoBERTa | 2021-03-12 |
Are NLP Models really able to Solve Simple Math Word Problems? | ✓ Link | 41.0 | 41.0 | GTS with RoBERTa | 2021-03-12 |
Are NLP Models really able to Solve Simple Math Word Problems? | ✓ Link | 40.3 | 40.3 | LSTM Seq2Seq with RoBERTa | 2021-03-12 |
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | ✓ Link | 40.1 | | SYRELM (GPT-J) | 2023-12-09 |
Are NLP Models really able to Solve Simple Math Word Problems? | ✓ Link | 38.9 | 38.9 | Transformer with RoBERTa | 2021-03-12 |
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems | ✓ Link | | 94.2 | GPT-4 DUP | 2024-04-23 |