OpenCodePapers

action-classification-on-kinetics-sounds

VideoAction Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop 1 AccuracyTop 5 AccuracyModelNameReleaseDate
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition93.3CA2ST(B/16)2025-03-30
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition92.9CAVA(B/16)2025-03-30
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities90.1Mirasol3B2023-11-09
Attention Bottlenecks for Multimodal Fusion✓ Link8596.8MBT (AV)2021-06-30