OpenCodePapers

video-retrieval-on-msvd

Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodetext-to-video R@1text-to-video R@5text-to-video R@10text-to-video Median Ranktext-to-video Mean Ranktext-to-video R@50video-to-text R@1video-to-text R@5video-to-text R@10video-to-text Median Rankvideo-to-text Mean RankModelNameReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link61.485.2InternVideo2-6B2024-03-22
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations59.084.090.3 1.07.6 73.094.596.61.07.6HunYuan_tvr (huge)2022-04-07
InternVideo: General Video Foundation Models via Generative and Discriminative Learning✓ Link58.476.3InternVideo2022-12-06
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations58.283.590.117.869.191.595.01.03.8HunYuan_tvr2022-04-07
vid-TLDR: Training Free Token merging for Light-weight Video Transformer✓ Link57.983.889.482.794.596.3vid-TLDR (UMT-L)2024-03-20
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending57.583.689.9VLAB2023-05-22
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization56.883.189.21.08.8MDMMT-22022-03-14
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning✓ Link56.181.788.81.08.4Side4Video2023-11-27
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss✓ Link51.887.687.618.969.390.694.613.1CAMoE2021-09-09
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?✓ Link51.880.888.318.370.093.296.212.4Cap4Video2022-12-31
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval✓ Link50.680.388.418.468.490.195.013.0CenterCLIP (ViT-B/16)2022-05-02
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval✓ Link50.480.68.466.890.44.2X-CLIP2022-07-15
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning✓ Link48.778.486.32.09.8DMAE (ViT-B/32)2023-09-20
Cross Modal Retrieval with Querybank Normalisation✓ Link48.077.986.22.0QB-Norm+CLIP2Video2021-12-23
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model✓ Link47.977.284.815.660.386.4921.04.5DiffusionRet+QB-Norm2023-03-17
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval✓ Link47.377.485.52.09.668.993.197.11.02.4PAU2023-09-29
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval✓ Link47.277.486.02.09.366.490.094.21.03.3X-Pool2022-03-28
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model✓ Link46.675.984.12.015.761.988.392.91.04.5DiffusionRet2023-03-17
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval✓ Link46.276.184.6210.062.087.392.61CLIP4Clip2021-04-18
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval✓ Link45.476.084.6LAFF2021-12-03
A Straightforward Framework For Video Retrieval Using CLIP✓ Link3764.173.83.059.985.290.71CLIP2021-02-24
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval✓ Link33.764.776.33FROZEN2021-04-01
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning✓ Link20.349.063.36.0----SSML2020-03-06
Use What You Have: Video Retrieval Using Representations From Collaborative Experts✓ Link19.849.063.86.023.189.0Collaborative Experts2019-07-31