OpenCodePapers

text-to-audio-retrieval-on-clotho

Text to Audio Retrieval

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	R@1	R@5	R@10	mAP@10	ModelName	ReleaseDate
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval	✓ Link	27.69	57.03	70.39	40.14	PaSST-RoBERTa & Estimated Audio–Caption Correspondences	2024-08-21
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	✓ Link	27.2				InternVideo2-6B	2024-03-22
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	✓ Link	26.9	53.2	66.1		VAST	2023-05-29
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets	✓ Link	26.07	55.27	69.30	38.56	PaSST–RoBERTa & GPT-augment	2023-08-08
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	✓ Link	22.4	49.0	62.7		ONE-PEACE	2023-05-18
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	✓ Link	17.5	42.7	55.3		VALOR	2023-04-17
Audio Retrieval with Natural Language Queries	✓ Link	9.6±0.3		40.1±0.7		CE (pretraining:AudioCaps)	2021-05-05
Audio Retrieval with Natural Language Queries	✓ Link	8.6±0.4		39.3±0.7		MoEE (pretraining:AudioCaps)	2021-05-05
Audio Retrieval with Natural Language Queries	✓ Link	6.7±0.4		33.2±0.3		CE	2021-05-05
Audio Retrieval with Natural Language Queries: A Benchmark Study	✓ Link	6.5±0.6		32.8±2.1		MMT	2021-12-17
Audio Retrieval with Natural Language Queries: A Benchmark Study	✓ Link	6.4±0.5		32.5±1.7		CE(pretraining:SoundDescs)	2021-12-17
Audio Retrieval with Natural Language Queries	✓ Link	6.0±0.1		32.3±0.3		MoEE	2021-05-05