OpenCodePapers

image-captioning-on-nocaps-xd-entire

Image Captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	CIDEr	B1	B2	B3	B4	ROUGE-L	METEOR	SPICE	ModelName	ReleaseDate
GIT: A Generative Image-to-text Transformer for Vision and Language	✓ Link	124.77	88.43	75.02	57.87	37.65	63.19	32.56	16.06	GIT2	2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language	✓ Link	123.39	88.1	74.81	57.68	37.35	63.12	32.5	15.94	GIT	2022-05-27
Scaling Up Vision-Language Pre-training for Image Captioning		114.25	85.62	71.36	53.62	34.65	61.2	31.27	14.85	Microsoft Cognitive Services team	2021-11-24
[]()		102.39	83.69	67.96	49.38	29.69	58.99	29.68	14.71	VLAF2
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning		100.12	82.27	66.04	47.48	28.95	58.26	29.47	14.04	Microsoft Cognitive Services team	2020-09-28
[]()		85.34	76.64	56.46	36.37	19.48	52.83	28.15	14.67	Human
[]()		85.3	78.77	61.54	41.85	23.77	54.59	25.96	11.84	icp2ssi1_coco_si_0.02_5_test
[]()		85.02	79.17	60.29	39.06	20.81	53.39	26.54	12.74	test_cbs2
[]()		73.09	76.59	56.74	35.39	18.41	51.82	24.42	11.2	UpDown + ELMo + CBS
[]()		61.48	73.42	52.12	29.35	12.88	48.74	22.06	9.69	Neural Baby Talk + CBS
[]()		54.25	74.0	55.11	35.23	19.16	50.92	22.96	10.14	UpDown
[]()		53.36	72.33	52.42	30.83	14.73	48.87	21.52	9.15	Neural Baby Talk