[]() | | 3.81 | 22.22 | 98.65 | 4.08 | | 11.34 | 266.96 | | | ZipEnhancer (M) | |
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 3.72 | 23.3 | 98.8 | | | 15 | 497.24 | | | TF-Locoformer (M) | 2024-08-06 |
[]() | | 3.69 | 21.15 | 98.32 | 3.99 | | 2.04 | 62.85 | | | ZipEnhancer (S) | |
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement | ✓ Link | 3.671 | 21.234 | | | | 2.33 | | 95.9 | 15.116 | MambAttention | 2025-07-01 |
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement | ✓ Link | 3.62 | 21.03 | | 3.92 | | | | | | MP-SENet | 2023-08-17 |
MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement | ✓ Link | 3.588 | 20.854 | | | | 2.20 | | 95.4 | 14.526 | xLSTM-SENet | 2025-07-01 |
High Fidelity Speech Enhancement with Band-split RNN | ✓ Link | 3.53 | 21.4 | 98.4 | 3.89 | | | | | | BSRNN-S + MRSD | 2022-12-01 |
High Fidelity Speech Enhancement with Band-split RNN | ✓ Link | 3.45 | 21.1 | 98.3 | 3.87 | | | | | | BSRNN-16k | 2022-12-01 |
A Mask Free Neural Network for Monaural Speech Enhancement | ✓ Link | 3.43 | 20.31 | | 3.74 | | | | | | MFNET | 2023-06-07 |
High Fidelity Speech Enhancement with Band-split RNN | ✓ Link | 3.42 | 21.3 | | | | | | | | BSRNN-S | 2022-12-01 |
High Fidelity Speech Enhancement with Band-split RNN | ✓ Link | 3.32 | | 98 | 3.79 | | | | | | BSRNN | 2022-12-01 |
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram | | 3.262 | | | 3.658 | | | | | | CleanUNet-2 | 2023-09-12 |
Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses | ✓ Link | 3.23 | | | | | | | | | FRCRN | 2021-02-03 |
FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement | ✓ Link | 3.218 | 16.81 | | 3.666 | | | | | | FullSubNet+ | 2022-03-23 |
Speech Denoising in the Waveform Domain with Self-Attention | ✓ Link | 3.146 | | | 3.551 | | | | | | CleanUNet | 2022-02-15 |
aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio | | 2.98 | | | | | | | | | aTENNuate | 2024-09-05 |
RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing | ✓ Link | 2.95 | 19.7 | | | | | | | | Sudo rm -rf (U=32) | 2022-02-17 |
Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform | | 2.82 | | | | | | | | | DCTCRN-P | 2021-02-09 |
Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform | | 2.82 | | | | | | | | | DCTCRN-T | 2021-02-09 |
Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform | | 2.79 | | | | | | | | | DCCRN-E | 2021-02-09 |
PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss | | 2.7885 | | | | | | | | | PoCoNet | 2020-08-11 |
FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement | ✓ Link | 2.777 | 17.29 | | 3.305 | | | | | | FullSubNet | 2020-10-29 |
Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform | | 2.77 | | | | | | | | | DCTCRN-S | 2021-02-09 |
A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement | ✓ Link | 2.75 | | | | | | | | | RNN-Modulation | 2021-02-15 |
Exploring the Best Loss Function for DNN-Based Low-latency Speech Enhancement with Temporal Convolutional Networks | | 2.73 | | | | | | | | | Conv-TasNet-SNR | 2020-08-20 |
Continual self-training with bootstrapped remixing for speech enhancement | ✓ Link | 2.69 | 18.6 | | | | | | | | Sudo rm-rf (U=8) | 2021-10-19 |
Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement | ✓ Link | 2.65 | | | 2.65 | | | | | | Proposed (0.35) | 2020-02-12 |
Continual self-training with bootstrapped remixing for speech enhancement | ✓ Link | 2.60 | 18.0 | | | | | | | | RemixIT (w Sudo U=32) | 2021-10-19 |
RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing | ✓ Link | 2.34 | 16.0 | | | | | | | | RemixIT (w Sudo U=32) | 2022-02-17 |
[]() | | 1.58 | 9.1 | 91.5 | | | | | | | Noisy | |
High Fidelity Speech Enhancement with Band-split RNN | ✓ Link | | 21.4 | 98.4 | 3.85 | | | | | | BSRNN-S + MGD | 2022-12-01 |
Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression | ✓ Link | | 16.34 | | 3.04 | | | | | | DTLN | 2020-05-15 |
Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net | ✓ Link | | 16.22 | | 3.01 | | | | | | Non-Real-Time MultiScale+ | 2020-06-01 |
Interactive Speech and Noise Modeling for Speech Enhancement | | | | | 3.39 | 19.52 | | | | | SN-Net | 2020-12-17 |
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement | ✓ Link | | | | 3.214 | | | | | | DCCRN-E-Aug | 2020-08-01 |
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement | ✓ Link | | | | 3.04 | | | | | | DCCRN-E | 2020-08-01 |