OpenCodePapers

audio-visual-question-answering-on-music-avqa

Audio-visual Question Answering
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeAccModelNameReleaseDate
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset✓ Link80.7VAST2023-05-29
[]()79.6CoQo(Internvideo2)
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset✓ Link78.9VALOR2023-04-17
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA78.26CAD2023-10-25
Vision Transformers are Parameter-Efficient Audio-Visual Learners✓ Link77.08LAVISH2022-12-15
Learning to Answer Questions in Dynamic Audio-Visual Scenarios✓ Link71.52ST-AVQA2022-03-26