OpenCodePapers
video-retrieval-on-vatex
Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
text-to-video R@1
↕
text-to-video R@5
↕
text-to-video R@10
↕
text-to-video R@50
↕
text-to-video MedianR
↕
text-to-video MeanR
↕
video-to-text R@1
↕
video-to-text R@10
↕
ModelName
ReleaseDate
↕
Gramian Multimodal Representation Learning and Alignment
✓ Link
87.7
100
84.6
100
GRAM
2024-12-16
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
✓ Link
83.0
98.2
99.2
VAST
2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
✓ Link
78.5
97.1
98.7
VALOR
2023-04-17
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
75.5
89.3
InternVideo2-6B
2024-03-22
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
✓ Link
72
95.1
97.8
86.0
99.6
Unmasked Teacher
2023-03-28
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
✓ Link
71.1
87.2
InternVideo
2022-12-06
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
✓ Link
68.8
93.5
97.0
1.0
2.7
Side4Video
2023-11-27
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
✓ Link
66.6
93.1
97.0
1
2.7
80.9
99.6
Cap4Video
2022-12-31
Holistic Features are almost Sufficient for Text-to-Video Retrieval
✓ Link
63.6
91.9
96.1
TeachCLIP
2024-01-01
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
✓ Link
59.1
95.2
TS2-Net
2022-07-16
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
✓ Link
59.1
91.7
96.3
LAFF
2021-12-03
Cross Modal Retrieval with Querybank Normalisation
✓ Link
58.8
93.8
QB-Norm+CLIP2Video
2021-12-23
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
✓ Link
57.3
90
95.5
CLIP2Video
2021-06-21