OpenCodePapers

dense-video-captioning-on-activitynet

Dense Video Captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeMETEORBLEU-3BLEU-4CIDErSODADIV-1DIV-2RE-4BLEU4F1PrecisionRecallModelNameReleaseDate
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning✓ Link1728Vid2Seq2023-02-27
Global Object Proposals for Improving Multi-Sentence Video Descriptions✓ Link16.369.4519.400.600.780.05ADV-INF + Global2021-07-18
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning11.28Bi-directional+intra captioning2020-06-14
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos✓ Link10.0333.337.11GVL2023-03-11
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020✓ Link9.71TSRM-CMG-HRNN+SCST2020-06-21
End-to-End Dense Video Captioning with Parallel Decoding✓ Link9.032.1731.146.05PDVC (TSP features, no SCST)2021-08-17
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks✓ Link8.754.162.02TSP2020-11-23
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval✓ Link8.5533.016.182.3855.2156.8153.71CM²2024-04-11
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer✓ Link8.443.841.88BMT2020-05-17
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering7.872.931.29iPerceive (Chadha et al., 2020)2020-11-16
Multi-modal Dense Video Captioning✓ Link7.312.61.07MDVC2020-03-17
VTimeLLM: Empower LLM to Grasp Video Moments✓ Link27.65.8VTimeLLM2023-11-30