OpenCodePapers
zero-shot-video-retrieval-on-activitynet
Zero-Shot Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
text-to-video R@1
↕
text-to-video R@5
↕
text-to-video R@10
↕
video-to-text R@1
↕
video-to-text R@5
↕
video-to-text R@10
↕
ModelName
ReleaseDate
↕
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
63.2
85.6
92.5
56.5
82.8
90.3
InternVideo2-6B
2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
60.4
83.9
90.8
54.8
81.5
89.5
InternVideo2-1B
2024-03-22
Gramian Multimodal Representation Learning and Alignment
✓ Link
59.0
91.2
50.9
85.8
GRAM
2024-12-16
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
✓ Link
42.8
69.6
79.8
40.7
67.6
78.6
UMT-L (ViT-L/16)
2023-03-28
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
✓ Link
42.8
69.4
79.6
41.2
68.2
79.1
vid-TLDR (UMT-L)
2024-03-20
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
✓ Link
41.0
68.4
80.0
39.1
69.8
81.1
LanguageBind(ViT-H/14)
2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
✓ Link
38.4
66.6
77.9
35.7
65.8
77.8
LanguageBind(ViT-L/14)
2023-10-03
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
✓ Link
37.0
66.7
78.9
BT-Adapter
2023-09-27
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
34.5
63.2
76.6
33.0
61.6
75.3
VideoCoCa
2022-12-09
Revealing Single Frame Bias for Video-and-Language Learning
✓ Link
30.8
55.9
66.3
Singularity-temporal-5M
2022-06-07
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
✓ Link
30.7
31.4
InternVideo
2022-12-06
Revealing Single Frame Bias for Video-and-Language Learning
✓ Link
30.6
55.6
66.9
Singularity-temporal-17M
2022-06-07