Paper | Code | Accuracy | ModelName | ReleaseDate |
---|---|---|---|---|
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities | 78.2 | Mirasol3B | 2023-11-09 | |
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | 61 | CA2ST(B/16) | 2025-03-30 | |
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | 60.3 | CAVA(B/16) | 2025-03-30 |