OpenCodePapers
zero-shot-video-retrieval-on-msvd
Zero-Shot Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
text-to-video R@1
↕
text-to-video R@5
↕
text-to-video R@10
↕
text-to-video Median Rank
↕
text-to-video Mean Rank
↕
video-to-text R@1
↕
video-to-text R@5
↕
video-to-text R@10
↕
video-to-text Median Rank
↕
ModelName
ReleaseDate
↕
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
59.3
84.4
89.6
83.1
94.2
97.0
InternVideo2-6B
2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
58.1
83.0
88.4
83.3
94.3
96.9
InternVideo2-1B
2024-03-22
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
✓ Link
54.8
80.9
87.2
1
VAST, HowToCaption-finetuned
2023-10-07
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
✓ Link
54.1
81.1
88.1
1.0
69.7
91.8
97.9
1.0
LanguageBind(ViT-L/14)
2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
✓ Link
53.9
80.4
87.8
1
72.0
91.4
96.3
1
LanguageBind(ViT-H/14)
2023-10-03
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
✓ Link
50.0
77.6
85.5
75.7
90.0
95.1
vid-TLDR (UMT-L)
2024-03-20
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
✓ Link
49.0
76.9
84.7
74.5
89.7
92.8
UMT-L (ViT-L/16)
2023-03-28
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
✓ Link
44.5
73.3
82.1
2
HowToCaption
2023-10-07
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
✓ Link
44.4
76.2
87.0
2.0
MILES
2022-04-26
Bridging Video-text Retrieval with Multiple Choice Questions
✓ Link
43.6
74.9
84.9
2.0
Y. Ge et. al.
2022-01-13
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
✓ Link
43.4
67.6
InternVideo
2022-12-06
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
✓ Link
38.5
66.9
76.8
2
17.8
CLIP4Clip
2021-04-18
LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval
36.9
68.6
81.0
2
34.4
69.0
79.2
3
LaT
2022-07-11
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
✓ Link
13.66
35.7
47.74
SSML
2020-03-06