OpenCodePapers

zero-shot-video-retrieval-on-youcook2

Zero-Shot Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodetext-to-video R@1text-to-video R@5text-to-video R@10text-to-video Mean Ranktext-to-video Median RankModelNameReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning26.154.170.8OmniVec22024-01-01
Multi-granularity Correspondence Learning from Long-term Noisy Videos✓ Link24.251.964.1Norton2024-01-30
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding✓ Link22.750.4 63.1VideoCLIP2021-09-28
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners20.343.053.3VideoCOca2022-12-09
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment19.943.255.78TACo2021-08-23
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale✓ Link19.743.653.98VAST, HowToCaption-finetuned2023-10-07
End-to-End Learning of Visual Representations from Uncurated Instructional Videos✓ Link15.138.051.210MIL-NCE2019-12-13
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale✓ Link13.433.144.115HowToCaption2023-10-07
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text✓ Link45.513VATT-MBS2021-04-22