speech-separation-on-wsj0-2mix

Speech Separation

Results over time

Click legend items to toggle metrics. Hover points for model names.

Leaderboard

Paper	Code	SI-SDRi	SDRi	Number of parameters (M)	MACs (G)	ModelName	ReleaseDate
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation	✓ Link	25.1	25.2	59.4	155.5	SepReformer-L	2024-06-10
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement	✓ Link	25.1	25.2	22.5		TF-Locoformer (L) + DM	2024-08-06
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement	✓ Link	24.6	24.7	15.0		TF-Locoformer (M) + DM	2024-08-06
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement	✓ Link	24.2	24.3	22.5		TF-Locoformer (L)	2024-08-06
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation	✓ Link	24.1		55.7		MossFormer2 (L)	2023-12-19
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor		24.0				SepTDA (L=12)	2024-01-23
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation		23.9				Separate And Diffuse	2023-01-25
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement	✓ Link	23.6	23.8	15.0		TF-Locoformer (M)	2024-08-06
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions	✓ Link	22.8		42.1	86.1	MossFormer (L) + DM	2023-02-23
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement	✓ Link	22.8	23	5.0		TF-Locoformer (S) + DM	2024-08-06
SPGM: Prioritizing Local Features for enhanced speech separation performance	✓ Link	22.7		26.2	77	SPGM + DM	2023-09-22
SepMamba: State-space models for speaker separation using Mamba	✓ Link	22.7	22.9			SepMamba + DM (M)	2024-10-28
MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions	✓ Link	22.5				MossFormer (M) + DM	2023-02-23
SepIt: Approaching a Single Channel Speech Separation Bound		22.4				SepIt	2022-05-24
Attention is All You Need in Speech Separation	✓ Link	22.3	22.4			SepFormer	2020-10-25
Wavesplit: End-to-End Speech Separation by Speaker Clustering		22.2	22.3			Wavesplit v2	2020-02-20
SPGM: Prioritizing Local Features for enhanced speech separation performance	✓ Link	22.1		26.2	77	SPGM	2023-09-22
TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement	✓ Link	22	22.1	5.0		TF-Locoformer (S)	2024-08-06
Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training	✓ Link	21.3	21.5			DPTNet (Libri1Mix speech enhancement pre-trained)	2020-10-29
On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments	✓ Link	21.2				TD-Conformer (XL) + DM	2023-10-09
SepMamba: State-space models for speaker separation using Mamba	✓ Link	21.2	21.4			SepMamba + DM (S)	2024-10-28
Sandglasset: A Light Multi-Granularity Self-attentive Network For Time-Domain Speech Separation	✓ Link	21.0				Sandglasset	2021-03-01
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks	✓ Link	20.3				GALR	2021-01-13
Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation	✓ Link	20.2				DPTNet	2020-07-28
Voice Separation with an Unknown Number of Multiple Speakers	✓ Link	20.12				Gated DualPathRNN	2020-02-29
Compute and memory efficient universal sound source separation	✓ Link	19.5				Sudo rm -rf (U=36)	2021-03-03
Wavesplit: End-to-End Speech Separation by Speaker Clustering		19.0				Wavesplit v1	2020-02-20
Sudo rm -rf: Efficient Networks for Universal Audio Source Separation	✓ Link	18.9				Sudo rm -rf XL	2020-07-14
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation	✓ Link	18.8				Dual-path RNN	2019-10-14
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation	✓ Link	17.7				DeepCASA	2019-04-25
Interrupted and cascaded permutation invariant training for speech separation	✓ Link	17.5				IAC-PIT Tasnet	2019-10-28
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation	✓ Link	17.2	17.4	3.6	3.7	Deformable TCN + Dynamic Mixing	2022-10-27
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering	✓ Link	16.6				Hybrid-Tasnet	2019-04-16
Two-Step Sound Source Separation: Training on Learned Latent Targets	✓ Link	16.1				Two-step Conv-TasNet	2019-10-22
Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation	✓ Link	16.1	16.3	1.3	3.7	Deformable TCN + Shared Weights + Dynamic Mixing	2022-10-27
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation	✓ Link	15.3	15.6	5.1		Conv-TasNet	2018-09-20
Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation Network	✓ Link	13.2				TasNet v2	2018-09-02
Alternative Objective Functions for Deep Clustering	✓ Link	11.5				Chimera++	2018-04-01
Deep clustering: Discriminative embeddings for segmentation and separation	✓ Link	10.8				Deep Clustering ++	2015-08-18
TasNet: time-domain audio separation network for real-time, single-channel speech separation	✓ Link	10.8				TasNet	2017-11-01

OpenCodePapers

speech-separation-on-wsj0-2mix