Paper | Code | CIDEr | METEOR | SODA | BLEU4 | ROUGE-L | F1 | Precision | Recall | ModelName | ReleaseDate |
---|---|---|---|---|---|---|---|---|---|---|---|
HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning | ✓ Link | 71.84 | 12.80 | 10.73 | 6.11 | 32.51 | 32.51 | 32.51 | HiCM² | 2024-12-19 | |
[]() | 67.2 | 12.3 | 10.3 | Vid2Seq (HowTo100M+VidChapters-7M PT) | |||||||
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning | ✓ Link | 47.1 | 9.3 | 7.9 | Vid2Seq | 2023-02-27 | |||||
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | ✓ Link | 31.66 | 6.08 | 5.34 | 1.63 | 28.43 | 33.38 | 24.76 | CM² | 2024-04-11 | |
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos | ✓ Link | 26.52 | 5.01 | 4.91 | GVL | 2023-03-11 | |||||
End-to-End Dense Video Captioning with Parallel Decoding | ✓ Link | 22.71 | 4.74 | 4.42 | 0.8 | PDVC (TSN features, no SCST) | 2021-08-17 | ||||
Multimodal Pretraining for Dense Video Captioning | ✓ Link | 39.03 | E2vidD6-MASSalign-BiD | 2020-11-10 |