OpenCodePapers

moment-retrieval-on-qvhighlights

Moment Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodemAPR@1 IoU=0.5R@1 IoU=0.7mAP@0.5mAP@0.75ModelNameReleaseDate
Saliency-Guided DETR for Moment Retrieval and Highlight Detection✓ Link58.8074.2060.4076.2060.80SG-DETR (w/ PT)2024-10-02
Saliency-Guided DETR for Moment Retrieval and Highlight Detection✓ Link54.1072.2056.6073.2055.80SG-DETR2024-10-02
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval✓ Link52.7376.5961.4869.4154.40LLaVA-MR2024-11-21
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding✓ Link52.0070.6953.9672.3353.85FlashVTG2024-12-18
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link49.2471.4256.45InternVideo2-6B2024-03-22
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding✓ Link47.9768.4853.1169.4049.12CG-DETR (w/ PT)2023-11-15
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval✓ Link47.9470.3655.2569.5349.17VideoLights-B-pt2024-12-02
Length-Aware DETR for Robust Moment Retrieval✓ Link47.9363.9451.1065.6549.44LA-DETR2024-12-30
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos✓ Link46.9164.0748.1265.6147.51BAM-DETR (w/ audio)2023-11-30
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos✓ Link46.6763.8847.9266.3348.22BAM-DETR (w/ PT ASR Captions)2023-11-30
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection✓ Link46.41 66.8051.0467.61 46.99LD-DETR2025-01-18
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding✓ Link46.1768.0349.3569.0447.56R^2-Tuning2024-03-31
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos✓ Link45.3662.7148.6464.5746.33BAM-DETR2023-11-30
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding✓ Link45.1866.6552.1964.3746.68video-mamba-suite2024-03-14
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval✓ Link44.0566.7349.9465.7643.91LLMEPET2024-07-21
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection✓ Link43.864.5348.3164.7843.65UVCOM (w/ PT ASR Captions)2023-11-28
UniVTG: Towards Unified Video-Language Temporal Grounding✓ Link43.6365.4350.0664.0645.02UniVTG (w/ PT)2023-07-31
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection✓ Link43.1863.5547.4763.3742.67UVCOM2023-11-28
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding✓ Link42.8665.4348.3864.5142.77CG-DETR2023-11-15
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection✓ Link40.6264.146.164.340.5QD-DETR (w/ PT)2023-03-24
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection✓ Link40.1963.0645.1063.0440.10QD-DETR (w/ audio)2023-03-24
Background-aware Moment Detection for Video Moment Retrieval✓ Link40.0860.1243.0563.0840.18BM-DETR2023-06-05
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection✓ Link40.063.245.263.440.4QD-DETR (only Video w/ PT ASR Captions)2023-03-24
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection✓ Link39.8662.4044.9862.5239.88QD-DETR (only Video)2023-03-24
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection✓ Link38.08UMT (w/ audio + PT ASR Cpations)2022-03-23
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries✓ Link36.1459.7840.3360.5135.36Moment-DETR (w/ PT ASR Cpations)2021-07-20
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection✓ Link36.12UMT2022-03-23
UniVTG: Towards Unified Video-Language Temporal Grounding✓ Link35.4758.8640.8657.6035.59UniVTG2023-07-31
[]()32.354.536.5SeViLA-Localizer
UnLoc: A Unified Framework for Video Localization Tasks✓ Link66.146.7UnLoc-L2023-08-21
UnLoc: A Unified Framework for Video Localization Tasks✓ Link64.548.8UnLoc-B2023-08-21
Boundary-Denoising for Video Activity Localization✓ Link59.2745.07DenoiseLoc2023-04-06