image-captioning-on-nocaps-in-domain

Image Captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	CIDEr	B1	B2	B3	B4	ROUGE-L	METEOR	SPICE	ModelName	ReleaseDate
PaLI: A Jointly-Scaled Multilingual Language-Image Model	✓ Link	149.1								PaLI	2022-09-14
GIT: A Generative Image-to-text Transformer for Vision and Language	✓ Link	124.18	88.86	75.86	59.94	41.1	63.82	33.83	16.36	GIT2, Single Model	2022-05-27
GIT: A Generative Image-to-text Transformer for Vision and Language	✓ Link	122.4	88.55	76.1	60.53	41.65	64.02	33.41	16.18	GIT, Single Model	2022-05-27
PaLI: A Jointly-Scaled Multilingual Language-Image Model	✓ Link	121.09	88.02	75.21	59.38	41.16	64.39	34.22	15.69	PaLI	2022-09-14
[]()		117.9	87.27	74.29	58.01	39.24	63.12	33.01	15.49	CoCa - Google Brain
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning		112.82	86.33	72.83	55.94	37.97	62.48	32.7	15.22	Microsoft Cognitive Services team	2020-09-28
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision	✓ Link	108.98	84.64	70.0	52.96	34.66	61.01	31.97	14.6	Single Model	2021-08-24
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features	✓ Link	105.9							13.6	GRIT (zero-shot, no VL pretraining, no CBS)	2022-07-20
[]()		104.9	84.2	69.57	52.56	34.8	60.52	31.77	15.04	FudanFVL
[]()		104.25	82.91	68.02	50.75	33.59	59.67	31.33	14.85	FudanWYZ
[]()		102.64	84.4	69.8	51.89	32.86	60.07	30.43	14.47	IEDA-LAB
[]()		101.69	83.77	68.7	51.26	32.76	59.75	30.51	14.99	vll@mk514
[]()		100.03	84.03	69.12	51.16	33.15	59.67	30.06	14.08	MD
[]()		99.9	81.86	67.2	50.5	34.11	59.54	31.61	15.17	firethehole
VinVL: Revisiting Visual Representations in Vision-Language Models	✓ Link	97.99	83.24	68.04	49.68	30.62	58.54	29.51	13.63	VinVL (Microsoft Cognitive Services + MSR)	2021-01-02
[]()		96.63	82.9	68.09	49.73	31.24	58.62	29.37	13.61	ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS
[]()		88.08	80.5	64.48	46.46	29.59	56.84	28.7	13.04	camel XE
[]()		87.86	79.58	63.09	43.92	26.07	55.88	27.97	12.6	evertyhing
[]()		87.28	80.68	64.7	45.33	27.09	56.76	27.7	12.79	RCAL
[]()		87.21	80.26	63.94	44.65	27.23	56.4	27.7	12.28	icgp2ssi1_coco_si_0.02_5_test
[]()		85.81	81.64	63.79	43.43	25.15	55.06	27.25	12.35	cxy_nocaps_training
[]()		85.81	81.64	63.79	43.43	25.15	55.06	27.25	12.35	作者给的test文件
ClipCap: CLIP Prefix for Image Captioning	✓ Link	84.85							12.14	ClipCap (Transformer)	2021-11-18
[]()		84.83	80.7	63.27	42.86	25.78	55.91	27.23	12.06	Oscar
[]()		84.79	81.61	63.74	43.22	24.82	55.03	27.27	12.3	Xinyi
[]()		80.61	76.89	57.3	37.78	21.49	53.47	28.53	14.99	Human
[]()		80.19	78.73	61.63	42.35	25.94	55.25	27.25	12.38	MQ-UpDown-C
ClipCap: CLIP Prefix for Image Captioning	✓ Link	79.73							12.2	ClipCap (MLP + GPT2 tuning)	2021-11-18
[]()		76.02	77.65	59.58	39.86	22.83	53.98	26.35	11.8	UpDown + ELMo + CBS
[]()		74.27	77.68	60.34	41.5	24.57	54.42	26.04	11.47	UpDown
[]()		74.27	77.68	60.34	41.5	24.57	54.42	26.04	11.46	nocaps_training
[]()		73.73	75.31	56.79	37.85	21.91	52.44	26.02	12.04	7_10-7_40000_predict_test.json
[]()		70.33	74.35	55.97	36.12	20.84	52.26	25.1	11.07	None
[]()		69.59	76.48	58.76	39.28	21.96	53.22	25.08	10.94	YX
[]()		68.98	77.06	59.97	40.54	23.8	53.49	25.06	10.55	B2
[]()		67.91	76.12	57.98	38.44	21.92	52.53	25.07	10.87	area_attention
[]()		64.37	72.76	53.52	34.13	19.45	50.53	23.47	10.11	coco_all_19
[]()		62.96	76.49	56.2	33.73	15.14	50.84	23.68	10.13	Neural Baby Talk + CBS
[]()		60.89	75.91	56.78	35.58	17.39	51.42	23.8	9.81	Neural Baby Talk
[]()		58.93	72.24	51.88	29.57	14.54	49.05	22.04	8.91	CS395T
[]()		53.34	72.05	52.89	31.92	16.71	49.64	22.04	9.16	Yu-Wu

OpenCodePapers

image-captioning-on-nocaps-in-domain