OpenCodePapers

natural-language-moment-retrieval-on-mad

VideoNatural Language Moment Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeR@1,IoU=0.1R@1,IoU=0.3R@1,IoU=0.5R@10,IoU=0.1R@10,IoU=0.3R@10,IoU=0.5R@100,IoU=0.1R@100,IoU=0.3R@100,IoU=0.5R@5,IoU=0.1R@5,IoU=0.5R@50,IoU=0.1R@50,IoU=0.3R@50,IoU=0.5R@5,IoU=0.3ModelNameReleaseDate
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos✓ Link17.312.76.7ReVisionLLM2024-11-22
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos✓ Link13.2510.967.0627.7316.1323.68DeCafNet2025-05-22
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos✓ Link12.439.485.6125.1210.8618.72RGNet2023-12-11
Localizing Moments in Long Video Via Multimodal Guidance✓ Link9.34.652.1624.3017.7311.0947.3539.5829.6818.967.439.7932.2323.2113.06Zero-Shot CLIP + Guidance Model2023-02-26
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions✓ Link6.573.131.3920.2614.138.3847.7336.9824.9915.055.4437.9228.7118.809.85CLIP2021-12-01
Localizing Moments in Long Video Via Multimodal Guidance✓ Link5.604.282.4823.6419.8613.7255.5949.3839.1216.078.7845.3539.7730.22VLG-Net + Guidance Model2023-02-26
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions✓ Link3.502.631.6118.3215.210.1849.6543.9534.1811.746.2338.4133.6825.339.49VLG-Net2021-12-01
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions✓ Link0.090.040.010.880.390.148.473.801.400.440.074.331.920.710.19Random Chance2021-12-01