TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 25.1 | 25.2 | 22.5 | | TF-Locoformer (L) + DM | 2024-08-06 |
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation | ✓ Link | 25.1 | 25.2 | 59.4 | 155.5 | SepReformer-L | 2024-06-10 |
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 24.6 | 24.7 | 15.0 | | TF-Locoformer (M) + DM | 2024-08-06 |
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 24.2 | 24.3 | 22.5 | | TF-Locoformer (L) | 2024-08-06 |
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation | ✓ Link | 24.1 | | 55.7 | | MossFormer2 (L) | 2023-12-19 |
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor | | 24.0 | | | | SepTDA (L=12) | 2024-01-23 |
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation | | 23.9 | | | | Separate And Diffuse | 2023-01-25 |
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 23.6 | 23.8 | 15.0 | | TF-Locoformer (M) | 2024-08-06 |
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 22.8 | 23 | 5.0 | | TF-Locoformer (S) + DM | 2024-08-06 |
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions | ✓ Link | 22.8 | | 42.1 | 86.1 | MossFormer (L) + DM | 2023-02-23 |
SepMamba: State-space models for speaker separation using Mamba | ✓ Link | 22.7 | 22.9 | | | SepMamba + DM (M) | 2024-10-28 |
SPGM: Prioritizing Local Features for enhanced speech separation performance | ✓ Link | 22.7 | | 26.2 | 77 | SPGM + DM | 2023-09-22 |
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions | ✓ Link | 22.5 | | | | MossFormer (M) + DM | 2023-02-23 |
SepIt: Approaching a Single Channel Speech Separation Bound | | 22.4 | | | | SepIt | 2022-05-24 |
Attention is All You Need in Speech Separation | ✓ Link | 22.3 | 22.4 | | | SepFormer | 2020-10-25 |
Wavesplit: End-to-End Speech Separation by Speaker Clustering | | 22.2 | 22.3 | | | Wavesplit v2 | 2020-02-20 |
SPGM: Prioritizing Local Features for enhanced speech separation performance | ✓ Link | 22.1 | | 26.2 | 77 | SPGM | 2023-09-22 |
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement | ✓ Link | 22 | 22.1 | 5.0 | | TF-Locoformer (S) | 2024-08-06 |
Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training | ✓ Link | 21.3 | 21.5 | | | DPTNet (Libri1Mix speech enhancement pre-trained) | 2020-10-29 |
SepMamba: State-space models for speaker separation using Mamba | ✓ Link | 21.2 | 21.4 | | | SepMamba + DM (S) | 2024-10-28 |
On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments | ✓ Link | 21.2 | | | | TD-Conformer (XL) + DM | 2023-10-09 |
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation | ✓ Link | 21.0 | | | | Sandglasset | 2021-03-01 |
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks | ✓ Link | 20.3 | | | | GALR | 2021-01-13 |
Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation | ✓ Link | 20.2 | | | | DPTNet | 2020-07-28 |
Voice Separation with an Unknown Number of Multiple Speakers | ✓ Link | 20.12 | | | | Gated DualPathRNN | 2020-02-29 |
Compute and memory efficient universal sound source separation | ✓ Link | 19.5 | | | | Sudo rm -rf (U=36) | 2021-03-03 |
Wavesplit: End-to-End Speech Separation by Speaker Clustering | | 19.0 | | | | Wavesplit v1 | 2020-02-20 |
Sudo rm -rf: Efficient Networks for Universal Audio Source Separation | ✓ Link | 18.9 | | | | Sudo rm -rf XL | 2020-07-14 |
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation | ✓ Link | 18.8 | | | | Dual-path RNN | 2019-10-14 |
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation | ✓ Link | 17.7 | | | | DeepCASA | 2019-04-25 |
Interrupted and cascaded permutation invariant training for speech separation | ✓ Link | 17.5 | | | | IAC-PIT Tasnet | 2019-10-28 |
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation | ✓ Link | 17.2 | 17.4 | 3.6 | 3.7 | Deformable TCN + Dynamic Mixing | 2022-10-27 |
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering | ✓ Link | 16.6 | | | | Hybrid-Tasnet | 2019-04-16 |
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation | ✓ Link | 16.1 | 16.3 | 1.3 | 3.7 | Deformable TCN + Shared Weights + Dynamic Mixing | 2022-10-27 |
Two-Step Sound Source Separation: Training on Learned Latent Targets | ✓ Link | 16.1 | | | | Two-step Conv-TasNet | 2019-10-22 |
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation | ✓ Link | 15.3 | 15.6 | 5.1 | | Conv-TasNet | 2018-09-20 |
Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation Network | ✓ Link | 13.2 | | | | TasNet v2 | 2018-09-02 |
Alternative Objective Functions for Deep Clustering | ✓ Link | 11.5 | | | | Chimera++ | 2018-04-01 |
TasNet: time-domain audio separation network for real-time, single-channel speech separation | ✓ Link | 10.8 | | | | TasNet | 2017-11-01 |
Deep clustering: Discriminative embeddings for segmentation and separation | ✓ Link | 10.8 | | | | Deep Clustering ++ | 2015-08-18 |