TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 52.4 | GPT-4 (PoT) | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 43.8 | GPT-4 (CoT) | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 35.6 | GPT-3.5-turbo (PoT) | 2023-05-21 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 32.5 | DART-Math-DSMath-7B-Uniform (0-shot CoT, w/o code) | 2024-06-18 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 32.2 | DART-Math-DSMath-7B-Prop2Diff (0-shot CoT, w/o code) | 2024-06-18 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 31.8 | PaLM-2-unicorn (CoT) | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 30.2 | GPT-3.5-turbo (CoT) | 2023-05-21 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 28.2 | DART-Math-Llama3-70B-Prop2Diff (0-shot CoT, w/o code) | 2024-06-18 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 27.4 | DART-Math-Llama3-70B-Uniform (0-shot CoT, w/o code) | 2024-06-18 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 25.9 | Claude-v1 (PoT) | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 24.9 | Claude-v1 (CoT) | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 23.9 | code-davinci-002 | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 23.6 | Claude-instant (CoT) | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 22.8 | text-davinci-003 | 2023-05-21 |
TheoremQA: A Theorem-driven Question Answering dataset | ✓ Link | 21.0 | PaLM-2-bison (CoT) | 2023-05-21 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 19.4 | DART-Math-Llama3-8B-Prop2Diff (0-shot CoT, w/o code) | 2024-06-18 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 17.0 | DART-Math-Mistral-7B-Prop2Diff (0-shot CoT, w/o code) | 2024-06-18 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 16.4 | DART-Math-Mistral-7B-Uniform (0-shot CoT, w/o code) | 2024-06-18 |
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving | ✓ Link | 15.4 | DART-Math-Llama3-8B-Uniform (0-shot CoT, w/o code) | 2024-06-18 |