Paper | Code | BLEU4 | BLEU-3 | CIDEr | ROUGE-L | METEOR | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | 14.7 | 39.3 | 35.0 | VideoCoCa | 2022-12-09 | |||
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning | ✓ Link | 14.5 | 31.13 | 36.56 | 17.97 | VLTinT (ae-test split) C3D/Ling | 2022-11-28 | |
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning | ✓ Link | 13.38 | 31.29 | 35.99 | 17.48 | VLCap (ae-test split) - Appearance + Language | 2022-06-26 | |
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning | ✓ Link | 10.85 | 17.43 | 28.19 | 31.45 | 15.99 | COOT (ae-test split) - Only Appearance features | 2020-11-01 |
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning | ✓ Link | 10.33 | 23.42 | 15.68 | MART (ae-test split) - Appearance + Flow | 2020-05-11 |