OpenCodePapers

video-retrieval-on-vatex

Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodetext-to-video R@1text-to-video R@5text-to-video R@10text-to-video R@50text-to-video MedianRtext-to-video MeanRvideo-to-text R@1video-to-text R@10ModelNameReleaseDate
Gramian Multimodal Representation Learning and Alignment✓ Link87.710084.6100GRAM2024-12-16
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link83.098.299.2VAST2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link78.597.198.7VALOR2023-04-17
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link75.589.3InternVideo2-6B2024-03-22
Unmasked Teacher: Towards Training-Efficient Video Foundation Models✓ Link7295.197.886.099.6Unmasked Teacher2023-03-28
InternVideo: General Video Foundation Models via Generative and Discriminative Learning✓ Link71.187.2InternVideo2022-12-06
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning✓ Link68.893.597.01.02.7Side4Video2023-11-27
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?✓ Link66.693.197.012.780.999.6Cap4Video2022-12-31
Holistic Features are almost Sufficient for Text-to-Video Retrieval✓ Link63.691.996.1TeachCLIP2024-01-01
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval✓ Link59.195.2TS2-Net2022-07-16
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval✓ Link59.191.796.3LAFF2021-12-03
Cross Modal Retrieval with Querybank Normalisation✓ Link58.893.8QB-Norm+CLIP2Video2021-12-23
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP✓ Link57.39095.5CLIP2Video2021-06-21