moment-retrieval-on-charades-sta

Moment Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	R@1 IoU=0.5	R@1 IoU=0.7	R@5 IoU=0.5	R@5 IoU=0.7	R@1 IoU=0.3	mIoU	ModelName	ReleaseDate
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	✓ Link	71.10	52.80					SG-DETR (w/ PT)	2024-10-02
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval	✓ Link	70.65	49.58					LLaVA-MR	2024-11-21
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding	✓ Link	70.32	49.87					FlashVTG	2024-12-18
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	✓ Link	70.20	49.50					SG-DETR	2024-10-02
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	70.03	48.95					InternVideo2-6B	2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	68.36	45.03					InternVideo2-1B	2024-03-22
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	✓ Link	67.1	43.0					VideoChat-T (FT)	2024-10-25
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection	✓ Link	63.98	44.46	91.94	67.72			UniMD+Sync.	2024-04-07
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection	✓ Link	62.58	41.56			73.92	53.44	LD-DETR	2025-01-18
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval	✓ Link	61.96	41.05			73.33	52.94	VideoLights-B-pt	2024-12-02
UnLoc: A Unified Framework for Video Localization Tasks	✓ Link	60.8	38.4	88.2	61.1			UnLoc-L	2023-08-21
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos	✓ Link	59.95	39.38					BAM-DETR	2023-11-30
Background-aware Moment Detection for Video Moment Retrieval	✓ Link	59.48	38.33					BM-DETR	2023-06-05
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection	✓ Link	59.25	36.64					UVCOM	2023-11-28
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding	✓ Link	58.44	36.34					CG-DETR	2023-11-15
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval	✓ Link	58.31	36.49					LLMEPET	2024-07-21
UnLoc: A Unified Framework for Video Localization Tasks	✓ Link	58.1	35.4	87.4	59.1			UnLoc-B	2023-08-21
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	✓ Link	57.31	32.55					QD-DETR (Only Video)	2023-03-24
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding	✓ Link	57.18	36.05					video-mamba-suite	2024-03-14
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries	✓ Link	55.65	34.17					Moment-DETR w/ PT (on 10K HowTo100M videos)	2021-07-20
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries	✓ Link	53.63	31.37					Moment-DETR	2021-07-20
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection	✓ Link	49.35	26.16	89.41	54.95			UMT (VO)	2022-03-23
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	✓ Link	48.7	24.0				45.43	VideoChat-T (ZS)	2024-10-25
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection	✓ Link	48.31	29.25	88.79	56.08			UMT (VA)	2022-03-23
SimVTP: Simple Video Text Pre-training with Masked Autoencoders		44.7	26.3	83.7	55.1			SimVTP	2022-12-07

OpenCodePapers

moment-retrieval-on-charades-sta