moment-retrieval-on-qvhighlights

Moment Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	mAP	R@1 IoU=0.5	R@1 IoU=0.7	mAP@0.5	mAP@0.75	ModelName	ReleaseDate
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	✓ Link	58.80	74.20	60.40	76.20	60.80	SG-DETR (w/ PT)	2024-10-02
Saliency-Guided DETR for Moment Retrieval and Highlight Detection	✓ Link	54.10	72.20	56.60	73.20	55.80	SG-DETR	2024-10-02
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval	✓ Link	52.73	76.59	61.48	69.41	54.40	LLaVA-MR	2024-11-21
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding	✓ Link	52.00	70.69	53.96	72.33	53.85	FlashVTG	2024-12-18
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	49.24	71.42	56.45			InternVideo2-6B	2024-03-22
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding	✓ Link	47.97	68.48	53.11	69.40	49.12	CG-DETR (w/ PT)	2023-11-15
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval	✓ Link	47.94	70.36	55.25	69.53	49.17	VideoLights-B-pt	2024-12-02
Length-Aware DETR for Robust Moment Retrieval	✓ Link	47.93	63.94	51.10	65.65	49.44	LA-DETR	2024-12-30
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos	✓ Link	46.91	64.07	48.12	65.61	47.51	BAM-DETR (w/ audio)	2023-11-30
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos	✓ Link	46.67	63.88	47.92	66.33	48.22	BAM-DETR (w/ PT ASR Captions)	2023-11-30
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection	✓ Link	46.41	66.80	51.04	67.61	46.99	LD-DETR	2025-01-18
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding	✓ Link	46.17	68.03	49.35	69.04	47.56	R^2-Tuning	2024-03-31
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos	✓ Link	45.36	62.71	48.64	64.57	46.33	BAM-DETR	2023-11-30
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding	✓ Link	45.18	66.65	52.19	64.37	46.68	video-mamba-suite	2024-03-14
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval	✓ Link	44.05	66.73	49.94	65.76	43.91	LLMEPET	2024-07-21
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection	✓ Link	43.8	64.53	48.31	64.78	43.65	UVCOM (w/ PT ASR Captions)	2023-11-28
UniVTG: Towards Unified Video-Language Temporal Grounding	✓ Link	43.63	65.43	50.06	64.06	45.02	UniVTG (w/ PT)	2023-07-31
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection	✓ Link	43.18	63.55	47.47	63.37	42.67	UVCOM	2023-11-28
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding	✓ Link	42.86	65.43	48.38	64.51	42.77	CG-DETR	2023-11-15
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	✓ Link	40.62	64.1	46.1	64.3	40.5	QD-DETR (w/ PT)	2023-03-24
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	✓ Link	40.19	63.06	45.10	63.04	40.10	QD-DETR (w/ audio)	2023-03-24
Background-aware Moment Detection for Video Moment Retrieval	✓ Link	40.08	60.12	43.05	63.08	40.18	BM-DETR	2023-06-05
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	✓ Link	40.0	63.2	45.2	63.4	40.4	QD-DETR (only Video w/ PT ASR Captions)	2023-03-24
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	✓ Link	39.86	62.40	44.98	62.52	39.88	QD-DETR (only Video)	2023-03-24
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection	✓ Link	38.08					UMT (w/ audio + PT ASR Cpations)	2022-03-23
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries	✓ Link	36.14	59.78	40.33	60.51	35.36	Moment-DETR (w/ PT ASR Cpations)	2021-07-20
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection	✓ Link	36.12					UMT	2022-03-23
UniVTG: Towards Unified Video-Language Temporal Grounding	✓ Link	35.47	58.86	40.86	57.60	35.59	UniVTG	2023-07-31
[]()		32.3	54.5	36.5			SeViLA-Localizer
UnLoc: A Unified Framework for Video Localization Tasks	✓ Link		66.1	46.7			UnLoc-L	2023-08-21
UnLoc: A Unified Framework for Video Localization Tasks	✓ Link		64.5	48.8			UnLoc-B	2023-08-21
Boundary-Denoising for Video Activity Localization	✓ Link		59.27	45.07			DenoiseLoc	2023-04-06

OpenCodePapers

moment-retrieval-on-qvhighlights