OpenCodePapers

video-captioning-on-vatex-1

Video Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBLEU-4CIDErMETEORROUGE-LModelNameReleaseDate
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link45.695.829.457.4VALOR2023-04-17
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link45.099.5VAST2023-05-29
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model✓ Link43.796.5COSA2023-06-15
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners39.777.854.5VideoCoCa2022-12-09
IcoCap: Improving Video Captioning by Compounding Images37.467.825.753.1IcoCap (ViT-B/16)2023-10-05
IcoCap: Improving Video Captioning by Compounding Images36.963.424.652.5IcoCap (ViT-B/32)2023-10-05
Diverse Video Captioning by Adaptive Spatio-temporal Attention✓ Link36.2565.0725.3251.88VASTA (Kinetics-backbone)2022-08-19
Accurate and Fast Compressed Video Captioning✓ Link35.864.825.352.0CoCap (ViT/L14)2023-09-22
Object Relational Graph with Teacher-Recommended Learning for Video Captioning32.149.722.248.9ORG-TRL2020-02-26
NITS-VC System for VATEX Video Captioning Challenge 202020.024.018.042.0NITS-VC2020-06-07