task-1-grouping-on-ocw

Constrained ClusteringOnly Connect Walls Dataset Task 1 (Grouping)

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	Wasserstein Distance (WD)	# Correct Groups	Fowlkes Mallows Score (FMS)	Adjusted Rand Index (ARI)	Adjusted Mutual Information (AMI)	# Solved Walls	Wasserstein Distance (WD)	ModelName	ReleaseDate
GPT-4 Technical Report	✓ Link	72.9	269	43.4	29.1	32.8	7		GPT-4 (5-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	73.4	262	43.7	29.7	33.5	4		GPT-4 (1-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	73.6	249	42.8	28.5	32.3	3		GPT-4 (100-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	73.7	272	43.9	29.9	33.6	5		GPT-4 (3-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	75.8	239	41.5	27.2	30.7	6		GPT-4 (0-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	80.6	149	37.3	22.0	25.4	2		GPT-3.5-turbo (5-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	80.9	140	36.8	21.3	24.7	0		GPT-3.5-turbo (3-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	81.2	137	36.1	20.4	24.0	2		GPT-3.5-turbo (10-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	82.3	123	34.4	18.2	21.2	0		GPT-3.5-turbo (1-shot)	2023-03-15
GPT-4 Technical Report	✓ Link	82.5	114	34.0	18.4	21.6	0		GPT-3.5-turbo (0-shot)	2023-03-15
Text Embeddings by Weakly-Supervised Contrastive Pre-training	✓ Link	83.8 ± .6	89 ± 6	33.1 ± .3	16.3 ± .4	19.5 ± .4	1 ± 0		E5 (BASE)	2022-12-07
Learning Word Vectors for 157 Languages	✓ Link	84.2 ± .5	80 ± 4	32.1 ± .3	15.2 ± .3	18.4 ± .4	0 ± 0		FastText (Crawl)	2018-02-19
Text Embeddings by Weakly-Supervised Contrastive Pre-training	✓ Link	84.4 ± .7	76 ± 5	32.3 ± .4	15.4 ± .5	18.5 ± .6	0 ± 0		E5 (LARGE)	2022-12-07
GloVe: Global Vectors for Word Representation	✓ Link	84.9 ± .4	68 ± 4	31.5 ± .3	14.4 ± .3	17.6 ± .4	0 ± 0		GloVe	2014-10-01
Learning Word Vectors for 157 Languages	✓ Link	85.5 ± .5	62 ± 3	30.4 ± .2	13.0 ± .2	15.8 ± .3	0 ± 0		FastText (News)	2018-02-19
MPNet: Masked and Permuted Pre-training for Language Understanding	✓ Link	86.3 ± .4	50 ± 4	29.4 ± .3	11.7 ± .4	14.3 ± .5	0 ± 0		all-mpnet (BASE)	2020-04-20
Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information	✓ Link	88.3 ± .5	33 ± 2	26.5 ± .2	8.2 ± .3	10.3 ± .3	0 ± 0		BERT (LARGE)	2019-11-25
Pre-Training of Deep Bidirectional Protein Sequence Representations with Structural Information	✓ Link	89.5 ± .4	22 ± 2	25.1 ± .2	6.4 ± .3	8.1 ± .4	0 ± 0		BERT (BASE)	2019-11-25
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset	✓ Link		1405				285		Human Performance	2023-06-19
Deep contextualized word representations	✓ Link		55 ± 4	29.5 ± .3	11.8 ± .4	14.5 ± .4	0 ± 0	86.3 ± .6	ELMo (LARGE)	2018-02-15
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	✓ Link		49 ± 4	29.1 ± .2	11.3 ± .3	14.0 ± .3	0 ± 0	86.7 ± .6	DistilBERT (BASE)	2019-10-02
RoBERTa: A Robustly Optimized BERT Pretraining Approach	✓ Link		29 ± 3	26.7 ± .2	8.4 ± .3	9.4 ± .4	0 ± 0	88.4 ± .4	RoBERTa (LARGE)	2019-07-26

OpenCodePapers

task-1-grouping-on-ocw