OpenCodePapers

natural-questions-on-theoremqa

General KnowledgeNatural Questions
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracyModelNameReleaseDate
TheoremQA: A Theorem-driven Question Answering dataset✓ Link52.4GPT-4 (PoT)2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link43.8GPT-4 (CoT)2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link35.6GPT-3.5-turbo (PoT)2023-05-21
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link32.5DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link32.2DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)2024-06-18
TheoremQA: A Theorem-driven Question Answering dataset✓ Link31.8PaLM-2-unicorn (CoT)2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link30.2GPT-3.5-turbo (CoT)2023-05-21
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link28.2DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link27.4DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)2024-06-18
TheoremQA: A Theorem-driven Question Answering dataset✓ Link25.9Claude-v1 (PoT)2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link24.9Claude-v1 (CoT)2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link23.9code-davinci-0022023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link23.6Claude-instant (CoT)2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link22.8text-davinci-0032023-05-21
TheoremQA: A Theorem-driven Question Answering dataset✓ Link21.0PaLM-2-bison (CoT)2023-05-21
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link19.4DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link17.0DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link16.4DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving✓ Link15.4DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)2024-06-18