OpenCodePapers
text-to-audio-retrieval-on-audiocaps
Text to Audio Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
R@1
↕
R@5
↕
R@10
↕
ModelName
ReleaseDate
↕
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
✓ Link
55.2
InternVideo2-6B
2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
✓ Link
52.0
76.8
82.9
VAST
2023-05-29
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
✓ Link
42.5
77.5
88.4
ONE-PEACE
2023-05-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
✓ Link
40.1
73.9
83.1
VALOR
2023-04-17
Audio Retrieval with Natural Language Queries: A Benchmark Study
✓ Link
36.1±3.3
84.5±2.0
MMT
2021-12-17
Exploring Train and Test-Time Augmentations for Audio-Language Learning
34.7
83.3
AL-MixGen + Multi-TTA
2022-10-31
Cross Modal Retrieval with Querybank Normalisation
✓ Link
23.9
71.6±0.4
QB-Norm+CE
2021-12-23
Audio Retrieval with Natural Language Queries: A Benchmark Study
✓ Link
23.6± 0.6
71.4±0.5
CE
2021-12-17
Audio Retrieval with Natural Language Queries
✓ Link
23.1±0.8
70.7±0.7
CE
2021-05-05
Audio Retrieval with Natural Language Queries: A Benchmark Study
✓ Link
23.0±0.7
71.0±1.2
MoEE
2021-12-17
Audio Retrieval with Natural Language Queries
✓ Link
22.5±0.3
69.5±0.9
MoEE
2021-05-05