OpenCodePapers

image-captioning-on-nocaps-xd-near-domain

Image Captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	CIDEr	B1	B2	B3	B4	ROUGE-L	METEOR	SPICE	ModelName	ReleaseDate
GIT: A Generative Image-to-text Transformer for Vision and Language	✓ Link	125.51	88.9	75.86	58.9	38.95	63.66	32.95	16.11	GIT2	2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language	✓ Link	123.92	88.56	75.48	58.46	38.44	63.5	32.86	15.96	GIT	2022-05-27
[]()		104.76	84.45	69.28	51.1	31.48	59.75	30.31	14.97	VLAF2
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning		101.2	82.88	67.01	48.73	30.21	58.76	30.0	14.27	Microsoft Cognitive Services team	2020-09-28
[]()		85.81	79.88	61.31	40.26	21.84	53.98	27.0	13.01	test_cbs2
[]()		85.73	79.51	62.65	43.22	24.97	55.13	26.37	11.96	icp2ssi1_coco_si_0.02_5_test
[]()		84.58	77.05	56.97	36.84	19.85	53.06	28.42	14.72	Human
[]()		74.2	77.68	58.31	37.04	19.85	52.64	24.97	11.45	UpDown + ELMo + CBS
[]()		61.98	74.77	53.67	30.66	13.85	49.45	22.55	9.83	Neural Baby Talk + CBS
[]()		56.85	75.25	56.93	36.91	20.49	51.84	23.6	10.33	UpDown
[]()		53.21	73.69	54.1	32.37	15.99	49.63	21.93	9.26	Neural Baby Talk