OpenCodePapers
audio-captioning-on-clotho
Audio captioning
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
Show papers without code
Paper
Code
SPIDEr
↕
CIDEr
↕
SPICE
↕
BLEU-4
↕
METEOR
↕
ROUGE-L
↕
FENSE
↕
SPIDEr-FL
↕
Sentence-BERT
↕
ModelName
ReleaseDate
↕
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
✓ Link
0.332
0.515
0.148
0.197
0.540
0.330
SLAM-AAC
2024-10-12
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
✓ Link
0.330
0.513
0.147
0.197
0.538
0.330
0.538
LOAE
2024-06-19
Enhancing Retrieval-Augmented Audio Captioning with Generation-Assisted Multimodal Querying and Progressive Learning
0.319
0.496
0.143
18.1
0.192
MQ-Cap
2024-10-14
THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS
0.318
0.400
0.137
Ensemble
2021-07-06
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
✓ Link
0.312
0.489
0.134
17.4
18.7
39.4
Audio Flamingo (Pengi trainset)
2024-02-02
THE SJTU SYSTEM FOR DCASE2021 CHALLENGE TASK 6: AUDIO CAPTIONING BASED ON ENCODER PRE-TRAINING AND REINFORCEMENT LEARNING
✓ Link
0.295
0.468
0.123
Ensemble-RL
2021-07-06
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
✓ Link
0.288
0.441
0.136
Qwen-Audio
2023-11-14
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
0.207
0.319
0.094
Ensemble
2020-07-01
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
✓ Link
0.519
19
19.3
40.8
VAST
2023-05-29
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
✓ Link
0.423
16.2
17.4
38.2
VALOR
2023-04-17
Audio Captioning using Gated Recurrent Units
0.18
RNN-GRU-EncDec + VGGish + Word2Vec
2020-06-05