OpenCodePapers
text-to-audio-retrieval-on-clotho
Text to Audio Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
R@1
↕
R@5
↕
R@10
↕
mAP@10
↕
ModelName
ReleaseDate
↕
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval
✓ Link
27.69
57.03
70.39
40.14
PaSST-RoBERTa & Estimated Audio–Caption Correspondences
2024-08-21
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
27.2
InternVideo2-6B
2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
✓ Link
26.9
53.2
66.1
VAST
2023-05-29
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
✓ Link
26.07
55.27
69.30
38.56
PaSST–RoBERTa & GPT-augment
2023-08-08
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
✓ Link
22.4
49.0
62.7
ONE-PEACE
2023-05-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
✓ Link
17.5
42.7
55.3
VALOR
2023-04-17
Audio Retrieval with Natural Language Queries
✓ Link
9.6±0.3
40.1±0.7
CE (pretraining:AudioCaps)
2021-05-05
Audio Retrieval with Natural Language Queries
✓ Link
8.6±0.4
39.3±0.7
MoEE (pretraining:AudioCaps)
2021-05-05
Audio Retrieval with Natural Language Queries
✓ Link
6.7±0.4
33.2±0.3
CE
2021-05-05
Audio Retrieval with Natural Language Queries: A Benchmark Study
✓ Link
6.5±0.6
32.8±2.1
MMT
2021-12-17
Audio Retrieval with Natural Language Queries: A Benchmark Study
✓ Link
6.4±0.5
32.5±1.7
CE(pretraining:SoundDescs)
2021-12-17
Audio Retrieval with Natural Language Queries
✓ Link
6.0±0.1
32.3±0.3
MoEE
2021-05-05