OpenCodePapers

gsm8k-on-gsm8k

GSM8K
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccuracy0-shot MRRModelNameReleaseDate
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team✓ Link98.1Xolver2025-06-17
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing✓ Link92AlphaLLM (with MCTS)2024-04-18
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking✓ Link98Orange-mini2025-01-20