OpenCodePapers

audio-captioning-on-clotho

Audio captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeSPIDErCIDErSPICEBLEU-4METEORROUGE-LFENSESPIDEr-FLSentence-BERTModelNameReleaseDate
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs✓ Link0.3320.5150.1480.1970.5400.330SLAM-AAC2024-10-12
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding✓ Link0.3300.5130.1470.1970.5380.3300.538LOAE2024-06-19
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning0.3190.4960.14318.10.192MQ-Cap2024-10-14
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS0.3180.4000.137Ensemble2021-07-06
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities✓ Link0.3120.4890.13417.418.739.4Audio Flamingo (Pengi trainset)2024-02-02
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING✓ Link0.2950.4680.123Ensemble-RL2021-07-06
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models✓ Link0.2880.4410.136Qwen-Audio2023-11-14
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation0.2070.3190.094Ensemble2020-07-01
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link0.5191919.340.8VAST2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link0.42316.217.438.2VALOR2023-04-17
Audio Captioning using Gated Recurrent Units0.18RNN-GRU-EncDec + VGGish + Word2Vec2020-06-05