OpenCodePapers

code-generation-on-humaneval

Code Generation
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodePass@1ModelNameReleaseDate
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging✓ Link100DeepSeek-R1 (MGDebugger)2024-10-02
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step✓ Link99.4LLaMA 32024-02-25
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks98.8QualityFlow (Sonnet-3.5)2025-01-20
Planning-Driven Programming: A Large Language Model Programming Workflow✓ Link98.2Phi-22024-11-21
Execution Guided Line-by-Line Code Generation✓ Link96.95EG-CFG (DeepSeek-V3-0324)2025-06-12
MapCoder: Multi-Agent Code Generation for Competitive Problem Solving✓ Link93.9Mistral 7B2024-05-18
[]()90.85Claude Sonnet 3.5
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation✓ Link90.2L2MAC (GPT-4)2023-10-02