Paper | Code | Top-1 Accuracy | Top-5 Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|
Multiscale Multimodal Transformer for Multimodal Action Recognition | 66.2 | 85.7 | MMT | 2022-09-22 | |
Contrastive Audio-Visual Masked Autoencoder | ✓ Link | 65.9 | CAV-MAE (Audio-Visual) | 2022-10-02 | |
UAVM: Towards Unifying Audio and Visual Models | ✓ Link | 65.8 | UAVM | 2022-07-29 | |
AVT: Audio-Video Transformer for Multimodal Action Recognition | 63.9 | 85.0 | AVT | 2022-09-22 |