OpenCodePapers

multi-modal-classification-on-vgg-sound

Multi-modal Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop-1 AccuracyTop-5 AccuracyModelNameReleaseDate
Multiscale Multimodal Transformer for Multimodal Action Recognition66.285.7MMT2022-09-22
Contrastive Audio-Visual Masked Autoencoder✓ Link65.9CAV-MAE (Audio-Visual)2022-10-02
UAVM: Towards Unifying Audio and Visual Models✓ Link65.8UAVM2022-07-29
AVT: Audio-Video Transformer for Multimodal Action Recognition63.985.0AVT2022-09-22