OpenCodePapers

video-retrieval-on-vatex

Video Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	text-to-video R@1	text-to-video R@5	text-to-video R@10	text-to-video R@50	text-to-video MedianR	text-to-video MeanR	video-to-text R@1	video-to-text R@10	ModelName	ReleaseDate
Gramian Multimodal Representation Learning and Alignment	✓ Link	87.7		100				84.6	100	GRAM	2024-12-16
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	✓ Link	83.0	98.2	99.2						VAST	2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	✓ Link	78.5	97.1	98.7						VALOR	2023-04-17
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	75.5						89.3		InternVideo2-6B	2024-03-22
Unmasked Teacher: Towards Training-Efficient Video Foundation Models	✓ Link	72	95.1	97.8				86.0	99.6	Unmasked Teacher	2023-03-28
InternVideo: General Video Foundation Models via Generative and Discriminative Learning	✓ Link	71.1						87.2		InternVideo	2022-12-06
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning	✓ Link	68.8	93.5	97.0	1.0	2.7				Side4Video	2023-11-27
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?	✓ Link	66.6	93.1	97.0		1	2.7	80.9	99.6	Cap4Video	2022-12-31
Holistic Features are almost Sufficient for Text-to-Video Retrieval	✓ Link	63.6	91.9	96.1						TeachCLIP	2024-01-01
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval	✓ Link	59.1		91.7	96.3					LAFF	2021-12-03
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval	✓ Link	59.1		95.2						TS2-Net	2022-07-16
Cross Modal Retrieval with Querybank Normalisation	✓ Link	58.8		93.8						QB-Norm+CLIP2Video	2021-12-23
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP	✓ Link	57.3		90	95.5					CLIP2Video	2021-06-21