OpenCodePapers

video-captioning-on-activitynet-captions

Video Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeBLEU4BLEU-3CIDErROUGE-LMETEORModelNameReleaseDate
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners14.739.335.0VideoCoCa2022-12-09
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning✓ Link14.531.1336.5617.97VLTinT (ae-test split) C3D/Ling2022-11-28
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning✓ Link13.3831.2935.9917.48VLCap (ae-test split) - Appearance + Language2022-06-26
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning✓ Link10.8517.4328.1931.4515.99COOT (ae-test split) - Only Appearance features2020-11-01
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning✓ Link10.3323.4215.68MART (ae-test split) - Appearance + Flow2020-05-11