OpenCodePapers

video-retrieval-on-youcook2

Video Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	text-to-video R@1	text-to-video R@5	text-to-video R@10	text-to-video Median Rank	text-to-video Mean Rank	ModelName	ReleaseDate
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	✓ Link	50.4	74.3	80.8			VAST	2023-05-29
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models	✓ Link	33.7	63.1	74.8	3		UniVL + MELTR	2023-03-23
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding	✓ Link	32.2	62.6	75.0			VideoCLIP	2021-09-28
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization		32.0	64.0	74.8	3.0	12.7	MDMMT-2	2022-03-14
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment		29.6	59.7	72.7	4		TACo	2021-08-23
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation	✓ Link	28.9	57.6	70.0	4		UniVL	2020-02-15
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding	✓ Link	27.05	56.88	69.38	4		VLM	2021-05-20
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding	✓ Link	22.7	50.4	63.1			VideoCLIP (zero-shot)	2021-09-28
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners		21.7	43.9	55.2			VideoCoCa (zero-shot)	2022-12-09
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning	✓ Link	16.7		52.3	9		COOT	2020-11-01
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips	✓ Link	8.2	24.5	35.3	24		Text-Video Embedding	2019-06-07
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval	✓ Link	6.3	16.9	25.2	53		RoME	2022-06-26
Semantic Role Aware Correlation Transformer for Text to Video Retrieval	✓ Link	5.3	14.5	20.8	77		Satar et al.	2022-06-26
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors		4.6	14.3	21.6	75		HGLMM FV CCA	2015-06-01
OmniVec: Learning robust representations with cross modal sharing				70.8			OmniVec	2023-11-07
OmniVec: Learning robust representations with cross modal sharing				64.2			OmniVec (pretrained)	2023-11-07