OpenCodePapers

text-to-audio-retrieval-on-audiocaps

Text to Audio Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	R@1	R@5	R@10	ModelName	ReleaseDate
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	55.2			InternVideo2-6B	2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	✓ Link	52.0	76.8	82.9	VAST	2023-05-29
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	✓ Link	42.5	77.5	88.4	ONE-PEACE	2023-05-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	✓ Link	40.1	73.9	83.1	VALOR	2023-04-17
Audio Retrieval with Natural Language Queries: A Benchmark Study	✓ Link	36.1±3.3		84.5±2.0	MMT	2021-12-17
Exploring Train and Test-Time Augmentations for Audio-Language Learning		34.7		83.3	AL-MixGen + Multi-TTA	2022-10-31
Cross Modal Retrieval with Querybank Normalisation	✓ Link	23.9		71.6±0.4	QB-Norm+CE	2021-12-23
Audio Retrieval with Natural Language Queries: A Benchmark Study	✓ Link	23.6± 0.6		71.4±0.5	CE	2021-12-17
Audio Retrieval with Natural Language Queries	✓ Link	23.1±0.8		70.7±0.7	CE	2021-05-05
Audio Retrieval with Natural Language Queries: A Benchmark Study	✓ Link	23.0±0.7		71.0±1.2	MoEE	2021-12-17
Audio Retrieval with Natural Language Queries	✓ Link	22.5±0.3		69.5±0.9	MoEE	2021-05-05