common-sense-reasoning-on-rucos

Common Sense Reasoning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Average F1	EM	ModelName	ReleaseDate
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark	✓ Link	0.93	0.89	Human Benchmark	2020-10-29
[]()		0.92	0.924	Golden Transformer
[]()		0.86	0.859	YaLM 1.0B few-shot
[]()		0.81	0.764	ruT5-large-finetune
[]()		0.79	0.752	ruT5-base-finetune
[]()		0.74	0.716	ruBert-base finetune
[]()		0.73	0.716	ruRoberta-large finetune
[]()		0.68	0.658	ruBert-large finetune
[]()		0.67	0.665	RuGPT3XL few-shot
mT5: A massively multilingual pre-trained text-to-text transformer	✓ Link	0.57	0.562	MT5 Large	2020-10-22
[]()		0.36	0.351	SBERT_Large
[]()		0.35	0.347	SBERT_Large_mt_ru_finetuning
[]()		0.32	0.314	RuBERT plain
[]()		0.29	0.29	Multilingual Bert
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks		0.26	0.257	heuristic majority	2021-05-03
RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark	✓ Link	0.26	0.252	Baseline TF-IDF1.1	2020-10-29
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks		0.25	0.247	Random weighted	2021-05-03
Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks		0.25	0.247	majority_class	2021-05-03
[]()		0.23	0.224	RuGPT3Medium
[]()		0.22	0.218	RuBERT conversational
[]()		0.21	0.204	RuGPT3Small
[]()		0.21	0.202	RuGPT3Large

OpenCodePapers