OpenCodePapers

audio-classification-on-esc-50

ClassificationAudio Classification
Dataset Link
Results over time
Click legend items to toggle metrics. Hover points for model names.
Leaderboard
PaperCodeTop-1 AccuracyPRE-TRAINING DATASETAccuracy (5-fold)ModelNameReleaseDate
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning99.1Multiple99.1OmniVec22024-01-01
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding✓ Link98.6Multiple98.6InternVideo22024-03-22
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP✓ Link98.5AudioSet,WavCaps98.5M2D2 AS+2025-03-28
OmniVec: Learning robust representations with cross modal sharing98.4Multiple98.4OmniVec2023-11-07
BEATs: Audio Pre-Training with Acoustic Tokenizers✓ Link98.1AudioSet98.1BEATs2022-12-18
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation✓ Link97.45AudioSet97.45mn40_as2022-11-09
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models✓ Link97.4AudioSet97.4DyMN-L2023-10-24
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation✓ Link97.4AudioSet97.4M2D-CLAP/0.72024-06-04
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework✓ Link97.2AudioSet97.2M2D-AS/0.72024-04-09
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection✓ Link97.0AudioSet97.0HTS-AT2022-02-02
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network✓ Link96.3AudioSet96.3EAT-M2022-04-25
LHGNN: Local-Higher Order Graph Neural Networks For Audio Classification and Tagging96.2LHGNN2025-01-07
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition96.1AudioSet96.1ERANN-2-52021-06-03
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework✓ Link96.096.0M2D/0.72024-04-09
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer✓ Link96.0AudioSet96.0EAT2024-01-07
AST: Audio Spectrogram Transformer✓ Link95.7AudioSet, ImageNet95.7Audio Spectrogram Transformer2021-04-05
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network✓ Link95.25AudioSet95.25EAT-S2022-04-25
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning✓ Link93.5AudioSet93.5MATPAC (SSL model, linear eval)2025-02-17
End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network✓ Link92.1592.15EAT-S (scratch)2022-04-25
Learning Rate Curriculum✓ Link91.5891.58SepTr + LeRaC2022-05-18
SepTr: Separable Transformer for Audio Spectrogram Processing✓ Link91.13-SepTr2022-03-17
Multi-Format Contrastive Learning of Audio Representations90.5Multi-Format Contrastive2021-03-11
[]()89.5EfficientNet89.5Multi-Channel Audio Feature with CNN
Audio-Visual Instance Discrimination with Cross-Modal Agreement✓ Link89.2AVID2020-04-27
Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices✓ Link87.187.1ACDNet2021-03-05
Self-Supervised Learning by Cross-Modal Audio-Video Clustering✓ Link85.4IG-RandomXDC2019-11-28
Self-Supervised Learning by Cross-Modal Audio-Video Clustering✓ Link84.8AudioSetXDC2019-11-28
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization82.3AVTS2018-06-30
Look, Listen and Learn✓ Link79.3L32017-05-23