Paper | Code | Mean AP | ModelName | ReleaseDate |
---|---|---|---|---|
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning | ✓ Link | 42.4 | EquiAV | 2024-03-14 |
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes | ✓ Link | 40.9 | SSLAM | 2025-06-13 |
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer | ✓ Link | 40.3 | EAT | 2024-01-07 |
BEATs: Audio Pre-Training with Acoustic Tokenizers | ✓ Link | 38.9 | BEATs | 2022-12-18 |
ATST: Audio Representation Learning with Teacher-Student Transformer | ✓ Link | 37.4 | Base (ours) | 2022-04-26 |
SSAST: Self-Supervised Audio Spectrogram Transformer | ✓ Link | 31.0 | SSAST-PATCH | 2021-10-19 |
SSAST: Self-Supervised Audio Spectrogram Transformer | ✓ Link | 29.2 | SSAST-FRAME | 2021-10-19 |
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks | 27.6 | Conformer | 2021-10-14 |