OpenCodePapers

text-to-audio-retrieval-on-audiocaps

Text to Audio Retrieval
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeR@1R@5R@10ModelNameReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link55.2InternVideo2-6B2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link52.076.882.9VAST2023-05-29
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities✓ Link42.577.588.4ONE-PEACE2023-05-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link40.173.983.1VALOR2023-04-17
Audio Retrieval with Natural Language Queries: A Benchmark Study✓ Link36.1±3.384.5±2.0MMT2021-12-17
Exploring Train and Test-Time Augmentations for Audio-Language Learning34.783.3AL-MixGen + Multi-TTA2022-10-31
Cross Modal Retrieval with Querybank Normalisation✓ Link23.971.6±0.4QB-Norm+CE2021-12-23
Audio Retrieval with Natural Language Queries: A Benchmark Study✓ Link23.6± 0.671.4±0.5CE2021-12-17
Audio Retrieval with Natural Language Queries✓ Link23.1±0.8 70.7±0.7CE2021-05-05
Audio Retrieval with Natural Language Queries: A Benchmark Study✓ Link23.0±0.771.0±1.2MoEE2021-12-17
Audio Retrieval with Natural Language Queries✓ Link22.5±0.3 69.5±0.9 MoEE2021-05-05