OpenCodePapers

natural-questions-on-theoremqa

General KnowledgeNatural Questions

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Accuracy	ModelName	ReleaseDate
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	52.4	GPT-4 (PoT)	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	43.8	GPT-4 (CoT)	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	35.6	GPT-3.5-turbo (PoT)	2023-05-21
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	32.5	DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code)	2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	32.2	DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code)	2024-06-18
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	31.8	PaLM-2-unicorn (CoT)	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	30.2	GPT-3.5-turbo (CoT)	2023-05-21
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	28.2	DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code)	2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	27.4	DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code)	2024-06-18
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	25.9	Claude-v1 (PoT)	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	24.9	Claude-v1 (CoT)	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	23.9	code-davinci-002	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	23.6	Claude-instant (CoT)	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	22.8	text-davinci-003	2023-05-21
TheoremQA: A Theorem-driven Question Answering dataset	✓ Link	21.0	PaLM-2-bison (CoT)	2023-05-21
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	19.4	DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code)	2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	17.0	DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code)	2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	16.4	DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code)	2024-06-18
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving	✓ Link	15.4	DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code)	2024-06-18