Robust One-step Speech Enhancement via Consistency Distillation | ✓ Link | 3.99 | 3.37 | 4.30 | 4.63 | 92.6 | 0.83 | 0.927 | 0.40 | 65 | ROSE-CD(PESQ) | 2025-07-08 |
The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement | | 3.82 | 2.49 | 3.5 | 3.63 | 0.92 | 0.84 | -2.72 | -19.8 | 30 | PESQetarian | 2024-06-05 |
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement | ✓ Link | 3.73 | 3.67 | 4.40 | 4.82 | 96 | | | | 6.28 | Mamba-SEUNet L (+PCS) | 2024-12-21 |
Investigating Training Objectives for Generative Speech Enhancement | ✓ Link | 3.70 | | | | | | | | | Schrödinger bridge (PESQ loss) | 2024-09-16 |
An Investigation of Incorporating Mamba for Speech Enhancement | ✓ Link | 3.69 | 3.63 | 4.37 | 4.79 | 96 | | | | 2.25 | SEMamba (+PCS) | 2024-05-10 |
[]() | | 3.63 | 3.87 | 4.36 | 4.81 | 96.19 | | 8.33 | 19.09 | 2.04 | ZipEnhancer (S, \lamba_6 = 0) | |
PrimeK-Net: Multi-scale Spectral Learning via Group Prime-Kernel Convolutional Neural Networks for Single Channel Speech Enhancement | ✓ Link | 3.61 | 3.98 | 4.35 | 4.81 | 96 | | | | 1.41 | PrimeK-Net | 2025-02-27 |
[]() | | 3.61 | 3.97 | 4.35 | 4.81 | 96.22 | | 10.01 | 19.96 | 2.04 | ZipEnhancer (S, \lamba_6 = 0.2) | |
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement | ✓ Link | 3.60 | 3.99 | 4.34 | 4.81 | 0.96 | | | | 2.26 | MP-SENet | 2023-08-17 |
[]() | | 3.54 | 3.49 | 4.20 | 4.75 | 0.96 | | | | | PCS_CS_WAVLM | |
xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement | ✓ Link | 3.53 | 3.98 | 4.27 | 4.78 | 0.96 | | | | 2.27 | xLSTM-SENet2 | 2025-01-10 |
SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks | | 3.52 | 3.97 | 4.25 | 4.75 | 96 | | 10.82 | | | SCP-CMGAN | 2022-10-26 |
Robust One-step Speech Enhancement via Consistency Distillation | ✓ Link | 3.49 | 3.33 | 4.04 | 4.523 | 94.73 | 0.87 | 3.34 | 17.80 | 65 | ROSE-CD | 2025-07-08 |
Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses | ✓ Link | 3.43 | | | | | | | | 0.86 | D2Former | 2021-02-03 |
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement | ✓ Link | 3.41 | 3.94 | 4.12 | 4.63 | 96 | | 11.1 | | | CMGAN | 2022-09-22 |
Perceptual Contrast Stretching on Target Feature for Speech Enhancement | ✓ Link | 3.35 | | 3.92 | 4.43 | 95 | | | | | PCS | 2022-03-31 |
D²Net: A Denoising and Dereverberation Network Based on Two-branch Encoder and Dual-path Transformer | | 3.27 | 3.18 | 3.92 | 4.63 | 96 | | | | | D²Net | 2022-11-21 |
aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio | | 3.27 | 2.85 | 3.96 | 4.57 | | | | 15.04 | | aTENNuate | 2024-09-05 |
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions | | 3.25 | | | | | | | | | Centaurus (0.51M) | 2025-01-22 |
MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement | ✓ Link | 3.24 | 3.07 | 3.73 | 4.23 | | | | | 1.89 | MetricGAN-OKD | 2023-07-24 |
MANNER: Multi-view Attention Network for Noise Erasure | ✓ Link | 3.21 | 3.65 | 3.91 | 4.53 | 95 | | | | | MANNER | 2022-03-04 |
Boosting Self-Supervised Embeddings for Speech Enhancement | ✓ Link | 3.20 | 3.58 | 3.88 | 4.52 | 95.7 | | | | | BSSE-SE | 2022-04-07 |
DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | ✓ Link | 3.17 | 3.61 | 3.77 | 4.34 | 0.944 | | | | | DeepFilterNet3 | 2023-05-14 |
Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models | ✓ Link | 3.17 | 3.53 | 3.83 | 4.43 | | | | | | PERL-AE | 2020-10-22 |
Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement | ✓ Link | 3.15 | 3.60 | 3.67 | 4.18 | | | | | | PFPL | 2020-10-28 |
MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement | ✓ Link | 3.15 | 3.16 | 3.64 | 4.14 | | | | | | MetricGAN+ | 2021-04-08 |
Multi-View Attention Transfer for Efficient Speech Enhancement | | 3.12 | 3.61 | 3.82 | 4.45 | 95 | | | | 1.38 | MANNER-S + MV-AT (8.1GF) | 2022-08-22 |
MetricGAN-OKD: Multi-Metric Optimization of MetricGAN via Online Knowledge Distillation for Speech Enhancement | ✓ Link | 3.12 | 3.13 | 3.64 | 4.17 | | | | | 0.82 | MetricGAN-OKD (Causal Arch.) | 2023-07-24 |
An Analysis of the Variance of Diffusion-based Speech Enhancement | | 3.11 | | | | | | | | | SGMSE+ | 2024-02-01 |
Real Time Speech Enhancement in the Waveform Domain | ✓ Link | 3.07 | 3.4 | 3.63 | 4.31 | 95 | | | | | DEMUCS (H=64, S=2 ,U =2) | 2020-06-23 |
Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement | | 3.05 | 3.58 | 3.86 | 4.51 | | | | | 0.014 | Dense-TSNet | 2024-09-18 |
Deep Residual-Dense Lattice Network for Speech Enhancement | ✓ Link | 3.02 | 3.43 | 3.72 | 4.38 | | | | | | RDL-Net 3.91M (Deep Xi - MMSE-LSA) | 2020-02-27 |
ROSE: A Recognition-Oriented Speech Enhancement Framework in Air Traffic Control Using Multi-Objective Learning | ✓ Link | 3.01 | 3.56 | 3.72 | 4.47 | 95 | | | | 36.98 | ROSE | 2023-12-11 |
FSPEN: AN ULTRA-LIGHTWEIGHT NETWORK FOR REAL TIME SPEECH ENAHNCMENT | ✓ Link | 2.97 | | | | 0.942 | | | | 0.079 | FSPEN | 2024-04-15 |
Deep Residual-Dense Lattice Network for Speech Enhancement | ✓ Link | 2.94 | 3.35 | 3.67 | 4.36 | | | | | | RDL-Net 3.91M (Deep Xi - SRWF) | 2020-02-27 |
Deep Residual-Dense Lattice Network for Speech Enhancement | ✓ Link | 2.93 | 3.32 | 3.62 | 4.29 | | | | | | RDL-Net 1.87M (Deep Xi - MMSE-LSA) | 2020-02-27 |
Real Time Speech Enhancement in the Waveform Domain | ✓ Link | 2.93 | 3.25 | 3.52 | 4.22 | 95 | | | | | Causal DEMUCS (H=48,S=4, U =4) | 2020-06-23 |
Speech Enhancement and Dereverberation with Diffusion-based Generative Models | ✓ Link | 2.93 | | | | | | | | | SGMSE+ (Diffusion Model) | 2022-08-11 |
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement | ✓ Link | 2.86 | 3.18 | 3.42 | 3.99 | | | | | | MetricGAN | 2019-05-13 |
Deep Residual-Dense Lattice Network for Speech Enhancement | ✓ Link | 2.84 | 3.23 | 3.56 | 4.27 | | | | | | RDL-Net 1.87M (Deep Xi - SRWF) | 2020-02-27 |
A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement | ✓ Link | 2.82 | | | | | | | | | real-time-GRU | 2021-02-15 |
End-to-end speech enhancement based on discrete cosine transform | ✓ Link | 2.7 | 3.29 | 3.29 | 3.9 | | | | | | DCT | 2019-10-17 |