Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning | ✓ Link | 80.74 | 52.94 | 68.85 | | | Lean | 77.87 | 80.74 | Kimina-Prover-Preview | 2025-04-15 |
Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis | ✓ Link | 66.0 | 36.5 | | | 52.5 | Isabelle | | | ProofAug | 2025-01-30 |
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search | ✓ Link | 63.5 | | 50.0 | 50.7 | | Lean | | | DeepSeek-Prover-V1.5 | 2024-08-15 |
SubgoalXL: Subgoal-based Expert Learning for Theorem Proving | ✓ Link | 56.1 | | 39.3 | | | Isabelle | | | Subgoal-XL | 2024-08-20 |
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | | 52.0 | 30.0 | | 46.3 | | Lean | | | DeepSeek-Prover | 2024-05-23 |
Lyra: Orchestrating Dual Correction in Automated Theorem Proving | ✓ Link | 47.1 | | | | 47.1 | Isabelle | | | Lyra + GPT-4 | 2023-09-27 |
LEGO-Prover: Neural Theorem Proving with Growing Libraries | ✓ Link | 47.1 | | | | 47.1 | Isabelle | | | LEGO-Prover ChatGPT | 2023-10-01 |
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving | ✓ Link | 45.5 | | | | 45.5 | Isabelle | | | Decomposing the Enigma | 2023-05-25 |
HyperTree Proof Search for Neural Theorem Proving | | 41 | | | 41 | | Lean | | | Evariste | 2022-05-23 |
HyperTree Proof Search for Neural Theorem Proving | | 40.6 | | | 40.6 | | Lean | | | Evariste-7d | 2022-05-23 |
HyperTree Proof Search for Neural Theorem Proving | | 38.9 | | | 38.9 | | Lean | | | Evariste-1d | 2022-05-23 |
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs | ✓ Link | 38.9 | | | | 38.9 | Isabelle | | | DSP (540B Minerva informal) | 2022-10-21 |
Formal Mathematics Statement Curriculum Learning | ✓ Link | 36.6 | 29.6 | 34.5 | 36.6 | | Lean | | | Lean Expert Iteration | 2022-02-03 |
HyperTree Proof Search for Neural Theorem Proving | | 36.6 | | | 36.6 | | Metamath | | | GPT-f | 2022-05-23 |
[]() | | 35.2 | 35.2 | | | | Isabelle | | | Thor + expert iteration on autoformalised theorems | |
An In-Context Learning Agent for Formal Theorem-Proving | ✓ Link | 30.7 | 30.7 | | | | Lean | | | COPRA + GPT-4-turbo | 2023-10-06 |
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers | | 29.9 | 29.9 | | | | Isabelle | | | Thor | 2022-05-22 |
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics | ✓ Link | 29.2 | 24.6 | 29.2 | | | Lean | | | Lean GPT-f | 2021-08-31 |
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning | ✓ Link | 28.3 | 28.3 | | | | Lean | | | MMOS-DeepSeekMath-7B | 2024-02-23 |
[]() | | 26.5 | | 26.5 | | | Lean | | | ReProver | |
Llemma: An Open Language Model For Mathematics | ✓ Link | 26.2 | | 26.2 | | | Lean | | | LLEMMA-7b | 2023-10-16 |
Llemma: An Open Language Model For Mathematics | ✓ Link | 25.8 | | 25.8 | | | Lean | | | LLEMMA-34b | 2023-10-16 |
Proof Artifact Co-training for Theorem Proving with Language Models | ✓ Link | 24.6 | 24.6 | | | | Isabelle | | | PACT (reproduced by Thor) | 2021-02-11 |
An In-Context Learning Agent for Formal Theorem-Proving | ✓ Link | 23.3 | 23.3 | | | | Lean | | | COPRA + GPT-4 | 2023-10-06 |
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs | ✓ Link | 20.9 | 20.9 | | | | Isabelle | | | Sledgehammer + heuristics | 2022-10-21 |
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics | ✓ Link | 18 | 18 | | | | Lean | | | Lean tidy | 2021-08-31 |
An In-Context Learning Agent for Formal Theorem-Proving | ✓ Link | 11.9 | 11.9 | | | | Lean | | | COPRA + GPT-3.5 | 2023-10-06 |
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers | | 10.4 | 10.4 | | | | Isabelle | | | Sledgehammer | 2022-05-22 |
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics | ✓ Link | 1.6 | 1.3 | | | | Metamath | | | Metamath GPT-f | 2021-08-31 |