OpenCodePapers

video-retrieval-on-youcook2

Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodetext-to-video R@1text-to-video R@5text-to-video R@10text-to-video Median Ranktext-to-video Mean RankModelNameReleaseDate
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link50.474.380.8VAST2023-05-29
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models✓ Link33.763.174.83UniVL + MELTR2023-03-23
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding✓ Link32.262.675.0VideoCLIP2021-09-28
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization32.064.074.83.012.7MDMMT-22022-03-14
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment29.659.772.74TACo2021-08-23
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation✓ Link28.957.670.04UniVL2020-02-15
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding✓ Link27.0556.8869.384VLM2021-05-20
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding✓ Link22.750.463.1VideoCLIP (zero-shot)2021-09-28
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners21.743.955.2VideoCoCa (zero-shot)2022-12-09
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning✓ Link16.752.39COOT2020-11-01
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips✓ Link8.224.535.324Text-Video Embedding2019-06-07
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval✓ Link6.316.925.253RoME2022-06-26
Semantic Role Aware Correlation Transformer for Text to Video Retrieval✓ Link5.314.520.877Satar et al.2022-06-26
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors4.614.321.675HGLMM FV CCA2015-06-01
OmniVec: Learning robust representations with cross modal sharing70.8OmniVec2023-11-07
OmniVec: Learning robust representations with cross modal sharing64.2OmniVec (pretrained)2023-11-07