on-web-bench

Code Generation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Pass@1	Pass@2	ModelName	ReleaseDate
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	25.11%	35.33%	claude-3-7-sonnet-20250219-thinking	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	23.04%	32.39%	claude-3-5-sonnet-20241022	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	22.50%	30.98%	claude-3-7-sonnet-20250219	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	21.96%	30.33%	claude-3-5-sonnet-20240620	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	21.09%	25.11%	gpt-4.1	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	20.76%	23.70%	gpt-4.1-mini	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	20.11%	30.22%	doubao-pro-1.5-thinking	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	17.17%	23.80%	gpt-4o	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	17.07%	23.59%	deepseek-v3-0324	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	16.74%	23.15%	deepseek-coder-v2	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	16.63%	22.93%	doubao-pro-1.5-32k	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	15.98%	20.87%	llama-4 Maverick	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	15.87%	19.02%	qwen-max-2025-01-25	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	15.67%	24.02%	gemini-2.5-pro	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	15.43%	21.74%	claude-3-5-haiku-20241022	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	15.33%	20.87%	gemini-2.0-flash	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	14.89%	19.24%	gemini-2.0-flash-thinking	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	14.78%	20.87%	gemini-pro-1.5	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	14.46%	26.20%	deepseek-r1	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	13.70%	15.87%	step-fun-2-16k	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	13.26%	22.93%	o4-mini	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	13.04%	18.70%	mistral-large-2411	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	12.83%	17.07%	gemini-flash-1.5	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	11.85%	15.11%	qwen-plus-2025-01-25	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	11.30%	17.17%	grok-2-1212	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	10.54%	13.70%	qwen-2.5-72b-instruct	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	10.43%	12.39%	o1	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	9.89%	11.85%	gemma-3-27b	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	9.13%	14.24%	o3-mini	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	8.48%	13.04%	gpt-4o-mini	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	8.48%	12.72%	sense-chat-5	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	8.48%	10.76%	minimax-text	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	8.26%	14.46%	360-gpt2-o1	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	7.50%	9.02%	GLM-4-0414	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	7.07%	12.28%	gpt-4.1-nano	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	6.63%	9.57%	llama-3.3	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	5.22%	11.85%	moonshot-kimi-latest	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	5.00%	7.72%	llama-4 Scout	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	3.48%	5.98%	doubao-pro-1.5-32k-lite	2025-05-12
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks	✓ Link	2.61%	5.11%	qwen-turbo-2024-11-01	2025-05-12

OpenCodePapers

on-web-bench