OpenCodePapers

video-retrieval-on-activitynet

Video Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodetext-to-video R@1text-to-video R@5text-to-video R@10text-to-video R@50text-to-video Mean Ranktext-to-video Median Rankvideo-to-text R@1video-to-text R@5video-to-text Mean Rankvideo-to-text Median Rankvideo-to-text R@10video-to-text R@50ModelNameReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link74.169.7InternVideo2-6B2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link70.590.995.5VAST2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link70.190.895.3VALOR2023-04-17
Gramian Multimodal Representation Learning and Alignment✓ Link69.996.166.995.4GRAM2024-12-16
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model✓ Link67.3COSA2023-06-15
Unmasked Teacher: Towards Training-Efficient Video Foundation Models✓ Link66.889.194.964.489.194.8UMT-L (ViT-L/16)2023-03-28
vid-TLDR: Training Free Token merging for Light-weight Video Transformer✓ Link66.788.694.463.988.794.5vid-TLDR (UMT-L)2024-03-20
InternVideo: General Video Foundation Models via Generative and Discriminative Learning✓ Link62.262.8InternVideo2022-12-06
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment✓ Link61.485.792.61CLIP-ViP2022-09-14
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations57.384.893.14.0157.785.73.4193.9HunYuan_tvr2022-04-07
VindLU: A Recipe for Effective Video-and-Language Pretraining✓ Link55.0 81.4 89.7VindLU2022-12-09
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding✓ Link54.880.889.6TESTA (ViT-B/16)2023-10-29
RTQ: Rethinking Video-language Understanding Based on Image-text Model✓ Link53.581.491.9RTQ2023-12-01
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning✓ Link53.480.789.25.31.0DMAE (ViT-B/32)2023-09-20
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss✓ Link51.077.787.66.31CAMoE2021-09-09
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations✓ Link50.678.798.1150.678.9198.4EMCL-Net++2022-11-21
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training49.777.186.7HiTeA2022-12-30
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model✓ Link48.185.76.82.047.476.36.72.086.7DiffusionRet+QB-Norm2023-03-17
Revealing Single Frame Bias for Video-and-Language Learning✓ Link47.175.585.5Singularity2022-06-07
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval✓ Link46.277.087.65.7246.777.15.5288.0CenterCLIP (ViT-B/16)2022-05-02
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval✓ Link46.275.56.846.475.96.4X-CLIP2022-07-15
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model✓ Link45.875.686.36.52.043.875.36.32.086.7DiffusionRet2023-03-17
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning✓ Link42.273.084.66.62.042.473.06.52.086.0HBI2023-03-25
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations✓ Link41.272.7242.774298.3EMCL-Net2022-11-21
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval✓ Link40.573.498.27.52CLIP4Clip2021-04-18
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment30.461.293.43.0TACo2021-08-23
Multi-modal Transformer for Video Retrieval✓ Link28.761.494.5163.3MMT-Pretrained2020-07-21
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions✓ Link28.557.4944HD-VILA2021-11-19
Video and Text Matching with Conditioned Embeddings✓ Link25.459.126.160Ours2021-10-21
Multi-modal Transformer for Video Retrieval✓ Link22.754.293.220.85MMT2020-07-21
Use What You Have: Video Retrieval Using Representations From Collaborative Experts✓ Link20.547.763.991.423.16Collaborative Experts2019-07-31