Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
[]() | 0.252 | o3 | ||
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI | 0.02 | Gemini 1.5 Pro (002) | 2024-11-07 | |
[]() | 0.01 | Claude 3.5 Sonnet | ||
[]() | 0.01 | o1-preview | ||
[]() | 0.01 | o1-mini | ||
[]() | 0.01 | GPT-4o |