OpenCodePapers

audio-captioning-on-clotho

Audio captioning

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	SPIDEr	CIDEr	SPICE	BLEU-4	METEOR	ROUGE-L	FENSE	SPIDEr-FL	Sentence-BERT	ModelName	ReleaseDate
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs	✓ Link	0.332	0.515	0.148		0.197		0.540	0.330		SLAM-AAC	2024-10-12
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding	✓ Link	0.330	0.513	0.147		0.197		0.538	0.330	0.538	LOAE	2024-06-19
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning		0.319	0.496	0.143	18.1	0.192					MQ-Cap	2024-10-14
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS		0.318	0.400	0.137							Ensemble	2021-07-06
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities	✓ Link	0.312	0.489	0.134	17.4	18.7	39.4				Audio Flamingo (Pengi trainset)	2024-02-02
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING	✓ Link	0.295	0.468	0.123							Ensemble-RL	2021-07-06
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models	✓ Link	0.288	0.441	0.136							Qwen-Audio	2023-11-14
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation		0.207	0.319	0.094							Ensemble	2020-07-01
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset	✓ Link		0.519		19	19.3	40.8				VAST	2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset	✓ Link		0.423		16.2	17.4	38.2				VALOR	2023-04-17
Audio Captioning using Gated Recurrent Units			0.18								RNN-GRU-EncDec + VGGish + Word2Vec	2020-06-05