OpenCodePapers

text-to-audio-retrieval-on-clotho

Text to Audio Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeR@1R@5R@10mAP@10ModelNameReleaseDate
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval✓ Link27.6957.0370.3940.14PaSST-RoBERTa & Estimated Audio–Caption Correspondences2024-08-21
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link27.2InternVideo2-6B2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link26.953.266.1VAST2023-05-29
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets✓ Link26.0755.2769.3038.56PaSST–RoBERTa & GPT-augment2023-08-08
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities✓ Link22.449.062.7ONE-PEACE2023-05-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link17.542.755.3VALOR2023-04-17
Audio Retrieval with Natural Language Queries✓ Link9.6±0.3 40.1±0.7 CE (pretraining:AudioCaps)2021-05-05
Audio Retrieval with Natural Language Queries✓ Link8.6±0.4 39.3±0.7 MoEE (pretraining:AudioCaps)2021-05-05
Audio Retrieval with Natural Language Queries✓ Link6.7±0.4 33.2±0.3 CE2021-05-05
Audio Retrieval with Natural Language Queries: A Benchmark Study✓ Link6.5±0.632.8±2.1MMT2021-12-17
Audio Retrieval with Natural Language Queries: A Benchmark Study✓ Link6.4±0.532.5±1.7CE(pretraining:SoundDescs)2021-12-17
Audio Retrieval with Natural Language Queries✓ Link6.0±0.1 32.3±0.3 MoEE2021-05-05