OpenCodePapers

dense-video-captioning-on-activitynet

Dense Video Captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	METEOR	BLEU-3	BLEU-4	CIDEr	SODA	DIV-1	DIV-2	RE-4	BLEU4	F1	Precision	Recall	ModelName	ReleaseDate
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning	✓ Link	17			28									Vid2Seq	2023-02-27
Global Object Proposals for Improving Multi-Sentence Video Descriptions	✓ Link	16.36		9.45	19.40		0.60	0.78	0.05					ADV-INF + Global	2021-07-18
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning		11.28												Bi-directional+intra captioning	2020-06-14
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos	✓ Link	10.03			33.33	7.11								GVL	2023-03-11
Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020	✓ Link	9.71												TSRM-CMG-HRNN+SCST	2020-06-21
End-to-End Dense Video Captioning with Parallel Decoding	✓ Link	9.03		2.17	31.14	6.05								PDVC (TSP features, no SCST)	2021-08-17
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks	✓ Link	8.75	4.16	2.02										TSP	2020-11-23
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval	✓ Link	8.55			33.01	6.18				2.38	55.21	56.81	53.71	CM²	2024-04-11
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer	✓ Link	8.44	3.84	1.88										BMT	2020-05-17
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering		7.87	2.93	1.29										iPerceive (Chadha et al., 2020)	2020-11-16
Multi-modal Dense Video Captioning	✓ Link	7.31	2.6	1.07										MDVC	2020-03-17
VTimeLLM: Empower LLM to Grasp Video Moments	✓ Link				27.6	5.8								VTimeLLM	2023-11-30