Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 25.11% | 35.33% | claude-3-7-sonnet-20250219-thinking | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 23.04% | 32.39% | claude-3-5-sonnet-20241022 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 22.50% | 30.98% | claude-3-7-sonnet-20250219 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 21.96% | 30.33% | claude-3-5-sonnet-20240620 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 21.09% | 25.11% | gpt-4.1 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 20.76% | 23.70% | gpt-4.1-mini | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 20.11% | 30.22% | doubao-pro-1.5-thinking | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 17.17% | 23.80% | gpt-4o | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 17.07% | 23.59% | deepseek-v3-0324 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 16.74% | 23.15% | deepseek-coder-v2 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 16.63% | 22.93% | doubao-pro-1.5-32k | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 15.98% | 20.87% | llama-4 Maverick | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 15.87% | 19.02% | qwen-max-2025-01-25 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 15.67% | 24.02% | gemini-2.5-pro | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 15.43% | 21.74% | claude-3-5-haiku-20241022 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 15.33% | 20.87% | gemini-2.0-flash | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 14.89% | 19.24% | gemini-2.0-flash-thinking | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 14.78% | 20.87% | gemini-pro-1.5 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 14.46% | 26.20% | deepseek-r1 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 13.70% | 15.87% | step-fun-2-16k | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 13.26% | 22.93% | o4-mini | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 13.04% | 18.70% | mistral-large-2411 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 12.83% | 17.07% | gemini-flash-1.5 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 11.85% | 15.11% | qwen-plus-2025-01-25 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 11.30% | 17.17% | grok-2-1212 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 10.54% | 13.70% | qwen-2.5-72b-instruct | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 10.43% | 12.39% | o1 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 9.89% | 11.85% | gemma-3-27b | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 9.13% | 14.24% | o3-mini | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 8.48% | 13.04% | gpt-4o-mini | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 8.48% | 12.72% | sense-chat-5 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 8.48% | 10.76% | minimax-text | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 8.26% | 14.46% | 360-gpt2-o1 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 7.50% | 9.02% | GLM-4-0414 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 7.07% | 12.28% | gpt-4.1-nano | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 6.63% | 9.57% | llama-3.3 | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 5.22% | 11.85% | moonshot-kimi-latest | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 5.00% | 7.72% | llama-4 Scout | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 3.48% | 5.98% | doubao-pro-1.5-32k-lite | 2025-05-12 |
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks | ✓ Link | 2.61% | 5.11% | qwen-turbo-2024-11-01 | 2025-05-12 |