OpenCodePapers

zero-shot-video-retrieval-on-youcook2

Zero-Shot Video Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	text-to-video R@1	text-to-video R@5	text-to-video R@10	text-to-video Mean Rank	text-to-video Median Rank	ModelName	ReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning		26.1	54.1	70.8			OmniVec2	2024-01-01
Multi-granularity Correspondence Learning from Long-term Noisy Videos	✓ Link	24.2	51.9	64.1			Norton	2024-01-30
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding	✓ Link	22.7	50.4	63.1			VideoCLIP	2021-09-28
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners		20.3	43.0	53.3			VideoCOca	2022-12-09
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment		19.9	43.2	55.7	8		TACo	2021-08-23
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale	✓ Link	19.7	43.6	53.9		8	VAST, HowToCaption-finetuned	2023-10-07
End-to-End Learning of Visual Representations from Uncurated Instructional Videos	✓ Link	15.1	38.0	51.2	10		MIL-NCE	2019-12-13
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale	✓ Link	13.4	33.1	44.1		15	HowToCaption	2023-10-07
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text	✓ Link			45.5	13		VATT-MBS	2021-04-22