OpenCodePapers

on-web-bench

Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodePass@1Pass@2ModelNameReleaseDate
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link25.11%35.33%claude-3-7-sonnet-20250219-thinking2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link23.04%32.39%claude-3-5-sonnet-202410222025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link22.50%30.98%claude-3-7-sonnet-202502192025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link21.96%30.33%claude-3-5-sonnet-202406202025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link21.09%25.11%gpt-4.12025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link20.76%23.70%gpt-4.1-mini2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link20.11%30.22%doubao-pro-1.5-thinking2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link17.17%23.80%gpt-4o2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link17.07%23.59%deepseek-v3-03242025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link16.74%23.15%deepseek-coder-v22025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link16.63%22.93%doubao-pro-1.5-32k2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link15.98%20.87%llama-4 Maverick2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link15.87%19.02%qwen-max-2025-01-252025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link15.67%24.02%gemini-2.5-pro2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link15.43%21.74%claude-3-5-haiku-202410222025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link15.33%20.87%gemini-2.0-flash2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link14.89%19.24%gemini-2.0-flash-thinking2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link14.78%20.87%gemini-pro-1.52025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link14.46%26.20%deepseek-r12025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link13.70%15.87%step-fun-2-16k2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link13.26%22.93%o4-mini2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link13.04%18.70%mistral-large-24112025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link12.83%17.07%gemini-flash-1.52025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link11.85%15.11%qwen-plus-2025-01-252025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link11.30%17.17%grok-2-12122025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link10.54%13.70%qwen-2.5-72b-instruct2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link10.43%12.39%o12025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link9.89%11.85%gemma-3-27b2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link9.13%14.24%o3-mini2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link8.48%13.04%gpt-4o-mini2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link8.48%12.72%sense-chat-52025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link8.48%10.76%minimax-text2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link8.26%14.46%360-gpt2-o12025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link7.50%9.02%GLM-4-04142025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link7.07%12.28%gpt-4.1-nano2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link6.63%9.57%llama-3.32025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link5.22%11.85%moonshot-kimi-latest2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link5.00%7.72%llama-4 Scout2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link3.48%5.98%doubao-pro-1.5-32k-lite2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks✓ Link2.61%5.11%qwen-turbo-2024-11-012025-05-12