OpenCodePapers

moment-retrieval-on-charades-sta

Moment Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeR@1 IoU=0.5R@1 IoU=0.7R@5 IoU=0.5R@5 IoU=0.7R@1 IoU=0.3mIoUModelNameReleaseDate
Saliency-Guided DETR for Moment Retrieval and Highlight Detection✓ Link71.1052.80SG-DETR (w/ PT)2024-10-02
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval✓ Link70.6549.58LLaVA-MR2024-11-21
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding✓ Link70.3249.87FlashVTG2024-12-18
Saliency-Guided DETR for Moment Retrieval and Highlight Detection✓ Link70.2049.50SG-DETR2024-10-02
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link70.0348.95InternVideo2-6B2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link68.3645.03InternVideo2-1B2024-03-22
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning✓ Link67.143.0VideoChat-T (FT)2024-10-25
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection✓ Link63.9844.4691.9467.72UniMD+Sync.2024-04-07
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection✓ Link62.5841.5673.92 53.44LD-DETR2025-01-18
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval✓ Link61.9641.0573.3352.94VideoLights-B-pt2024-12-02
UnLoc: A Unified Framework for Video Localization Tasks✓ Link60.838.488.261.1UnLoc-L2023-08-21
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos✓ Link59.9539.38BAM-DETR2023-11-30
Background-aware Moment Detection for Video Moment Retrieval✓ Link59.4838.33BM-DETR2023-06-05
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection✓ Link59.2536.64UVCOM2023-11-28
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding✓ Link58.4436.34CG-DETR2023-11-15
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval✓ Link58.3136.49LLMEPET2024-07-21
UnLoc: A Unified Framework for Video Localization Tasks✓ Link58.135.487.459.1UnLoc-B2023-08-21
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection✓ Link57.3132.55QD-DETR (Only Video)2023-03-24
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding✓ Link57.1836.05video-mamba-suite2024-03-14
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries✓ Link55.6534.17Moment-DETR w/ PT (on 10K HowTo100M videos)2021-07-20
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries✓ Link53.6331.37Moment-DETR2021-07-20
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection✓ Link49.3526.1689.4154.95UMT (VO)2022-03-23
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning✓ Link48.724.045.43VideoChat-T (ZS)2024-10-25
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection✓ Link48.3129.2588.7956.08UMT (VA)2022-03-23
SimVTP: Simple Video Text Pre-training with Masked Autoencoders44.726.383.755.1SimVTP2022-12-07