Paper | Code | Top 1 Accuracy | Top 5 Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|---|
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | 93.3 | CA2ST(B/16) | 2025-03-30 | ||
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | 92.9 | CAVA(B/16) | 2025-03-30 | ||
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities | 90.1 | Mirasol3B | 2023-11-09 | ||
Attention Bottlenecks for Multimodal Fusion | ✓ Link | 85 | 96.8 | MBT (AV) | 2021-06-30 |