OpenCodePapers

natural-language-moment-retrieval-on-mad

VideoNatural Language Moment Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	R@1,IoU=0.1	R@1,IoU=0.3	R@1,IoU=0.5	R@10,IoU=0.1	R@10,IoU=0.3	R@10,IoU=0.5	R@100,IoU=0.1	R@100,IoU=0.3	R@100,IoU=0.5	R@5,IoU=0.1	R@5,IoU=0.5	R@50,IoU=0.1	R@50,IoU=0.3	R@50,IoU=0.5	R@5,IoU=0.3	ModelName	ReleaseDate
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos	✓ Link	17.3	12.7	6.7													ReVisionLLM	2024-11-22
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos	✓ Link	13.25	10.96	7.06							27.73	16.13				23.68	DeCafNet	2025-05-22
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos	✓ Link	12.43	9.48	5.61							25.12	10.86				18.72	RGNet	2023-12-11
Localizing Moments in Long Video Via Multimodal Guidance	✓ Link	9.3	4.65	2.16	24.30	17.73	11.09	47.35	39.58	29.68	18.96	7.4	39.79	32.23	23.21	13.06	Zero-Shot CLIP + Guidance Model	2023-02-26
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions	✓ Link	6.57	3.13	1.39	20.26	14.13	8.38	47.73	36.98	24.99	15.05	5.44	37.92	28.71	18.80	9.85	CLIP	2021-12-01
Localizing Moments in Long Video Via Multimodal Guidance	✓ Link	5.60	4.28	2.48	23.64	19.86	13.72	55.59	49.38	39.12	16.07	8.78	45.35	39.77	30.22		VLG-Net + Guidance Model	2023-02-26
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions	✓ Link	3.50	2.63	1.61	18.32	15.2	10.18	49.65	43.95	34.18	11.74	6.23	38.41	33.68	25.33	9.49	VLG-Net	2021-12-01
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions	✓ Link	0.09	0.04	0.01	0.88	0.39	0.14	8.47	3.80	1.40	0.44	0.07	4.33	1.92	0.71	0.19	Random Chance	2021-12-01