OpenCodePapers
tinyqa-benchmark-on-tinyqabenchmark-core-en
TinyQA Benchmark++
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
Exact Match
↕
Exact Macth
↕
ModelName
ReleaseDate
↕
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
86.5
gemma-3-4b
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
84.6
mistral-24b-instruct
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
84.6
llama-3.2-3b-instruct
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
80.8
ministral-8b
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
76.9
ministral-3b
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
53.8
llama-3.2-1b-instruct
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
50.0
mistral-7b-instruct
2025-05-17
Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation
✓ Link
90.4
gemma-3-12b
2025-05-17