Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time | ✓ Link | 79.15 | Meerkat | 2024-07-01 |
Question-Aware Gaussian Experts for Audio-Visual Question Answering | ✓ Link | 76.43 | QA-TIGER | 2025-03-06 |
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering | ✓ Link | 75.44 | LAST-Att | 2023-10-10 |
Vision Transformers are Parameter-Efficient Audio-Visual Learners | ✓ Link | 73.18 | LAVISH | 2022-12-15 |
Learning to Answer Questions in Dynamic Audio-Visual Scenarios | ✓ Link | 71.02 | AVST | 2022-03-26 |