OpenCodePapers
video-retrieval-on-youcook2
Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
text-to-video R@1
↕
text-to-video R@5
↕
text-to-video R@10
↕
text-to-video Median Rank
↕
text-to-video Mean Rank
↕
ModelName
ReleaseDate
↕
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
✓ Link
50.4
74.3
80.8
VAST
2023-05-29
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
✓ Link
33.7
63.1
74.8
3
UniVL + MELTR
2023-03-23
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
✓ Link
32.2
62.6
75.0
VideoCLIP
2021-09-28
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
32.0
64.0
74.8
3.0
12.7
MDMMT-2
2022-03-14
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
29.6
59.7
72.7
4
TACo
2021-08-23
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
✓ Link
28.9
57.6
70.0
4
UniVL
2020-02-15
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
✓ Link
27.05
56.88
69.38
4
VLM
2021-05-20
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
✓ Link
22.7
50.4
63.1
VideoCLIP (zero-shot)
2021-09-28
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
21.7
43.9
55.2
VideoCoCa (zero-shot)
2022-12-09
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
✓ Link
16.7
52.3
9
COOT
2020-11-01
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
✓ Link
8.2
24.5
35.3
24
Text-Video Embedding
2019-06-07
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
✓ Link
6.3
16.9
25.2
53
RoME
2022-06-26
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
✓ Link
5.3
14.5
20.8
77
Satar et al.
2022-06-26
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors
4.6
14.3
21.6
75
HGLMM FV CCA
2015-06-01
OmniVec: Learning robust representations with cross modal sharing
70.8
OmniVec
2023-11-07
OmniVec: Learning robust representations with cross modal sharing
64.2
OmniVec (pretrained)
2023-11-07