OpenCodePapers

tinyqa-benchmark-on-tinyqabenchmark-core-en

TinyQA Benchmark++
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeExact MatchExact MacthModelNameReleaseDate
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link86.5gemma-3-4b2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link84.6mistral-24b-instruct2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link84.6llama-3.2-3b-instruct2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link80.8ministral-8b2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link76.9ministral-3b2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link53.8llama-3.2-1b-instruct2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link50.0mistral-7b-instruct2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation✓ Link90.4gemma-3-12b2025-05-17