OpenCodePapers

video-captioning-on-vatex-1

Video Captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	BLEU-4	CIDEr	METEOR	ROUGE-L	ModelName	ReleaseDate
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	✓ Link	45.6	95.8	29.4	57.4	VALOR	2023-04-17
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	✓ Link	45.0	99.5			VAST	2023-05-29
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model	✓ Link	43.7	96.5			COSA	2023-06-15
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners		39.7	77.8		54.5	VideoCoCa	2022-12-09
IcoCap: Improving Video Captioning by Compounding Images		37.4	67.8	25.7	53.1	IcoCap (ViT-B/16)	2023-10-05
IcoCap: Improving Video Captioning by Compounding Images		36.9	63.4	24.6	52.5	IcoCap (ViT-B/32)	2023-10-05
Diverse Video Captioning by Adaptive Spatio-temporal Attention	✓ Link	36.25	65.07	25.32	51.88	VASTA (Kinetics-backbone)	2022-08-19
Accurate and Fast Compressed Video Captioning	✓ Link	35.8	64.8	25.3	52.0	CoCap (ViT/L14)	2023-09-22
Object Relational Graph with Teacher-Recommended Learning for Video Captioning		32.1	49.7	22.2	48.9	ORG-TRL	2020-02-26
NITS-VC System for VATEX Video Captioning Challenge 2020		20.0	24.0	18.0	42.0	NITS-VC	2020-06-07