OpenCodePapers

zero-shot-video-retrieval-on-activitynet

Zero-Shot Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodetext-to-video R@1text-to-video R@5text-to-video R@10video-to-text R@1video-to-text R@5video-to-text R@10ModelNameReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link63.285.692.556.582.890.3InternVideo2-6B2024-03-22
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link60.483.990.854.881.589.5InternVideo2-1B2024-03-22
Gramian Multimodal Representation Learning and Alignment✓ Link59.091.250.985.8GRAM2024-12-16
Unmasked Teacher: Towards Training-Efficient Video Foundation Models✓ Link42.869.679.840.767.678.6UMT-L (ViT-L/16)2023-03-28
vid-TLDR: Training Free Token merging for Light-weight Video Transformer✓ Link42.869.479.641.268.279.1vid-TLDR (UMT-L)2024-03-20
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment✓ Link41.068.480.039.169.881.1LanguageBind(ViT-H/14)2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment✓ Link38.466.677.935.765.877.8LanguageBind(ViT-L/14)2023-10-03
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning✓ Link37.066.778.9BT-Adapter2023-09-27
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners34.563.276.633.061.675.3VideoCoCa2022-12-09
Revealing Single Frame Bias for Video-and-Language Learning✓ Link30.855.966.3Singularity-temporal-5M2022-06-07
InternVideo: General Video Foundation Models via Generative and Discriminative Learning✓ Link30.731.4InternVideo2022-12-06
Revealing Single Frame Bias for Video-and-Language Learning✓ Link30.655.666.9Singularity-temporal-17M2022-06-07