OpenCodePapers

zero-shot-video-retrieval-on-activitynet

Zero-Shot Video Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	text-to-video R@1	text-to-video R@5	text-to-video R@10	video-to-text R@1	video-to-text R@5	video-to-text R@10	ModelName	ReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	63.2	85.6	92.5	56.5	82.8	90.3	InternVideo2-6B	2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	60.4	83.9	90.8	54.8	81.5	89.5	InternVideo2-1B	2024-03-22
Gramian Multimodal Representation Learning and Alignment	✓ Link	59.0		91.2	50.9		85.8	GRAM	2024-12-16
Unmasked Teacher: Towards Training-Efficient Video Foundation Models	✓ Link	42.8	69.6	79.8	40.7	67.6	78.6	UMT-L (ViT-L/16)	2023-03-28
vid-TLDR: Training Free Token merging for Light-weight Video Transformer	✓ Link	42.8	69.4	79.6	41.2	68.2	79.1	vid-TLDR (UMT-L)	2024-03-20
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	✓ Link	41.0	68.4	80.0	39.1	69.8	81.1	LanguageBind(ViT-H/14)	2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment	✓ Link	38.4	66.6	77.9	35.7	65.8	77.8	LanguageBind(ViT-L/14)	2023-10-03
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning	✓ Link	37.0	66.7	78.9				BT-Adapter	2023-09-27
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners		34.5	63.2	76.6	33.0	61.6	75.3	VideoCoCa	2022-12-09
Revealing Single Frame Bias for Video-and-Language Learning	✓ Link	30.8	55.9	66.3				Singularity-temporal-5M	2022-06-07
InternVideo: General Video Foundation Models via Generative and Discriminative Learning	✓ Link	30.7			31.4			InternVideo	2022-12-06
Revealing Single Frame Bias for Video-and-Language Learning	✓ Link	30.6	55.6	66.9				Singularity-temporal-17M	2022-06-07