fronx
/

Fast-FullSubNet

speech enhancement

speech separation

noise suppression

Model card Files Files and versions

Fast-FullSubNet / README.md

fronx's picture

Update README.md

e7a917b verified about 2 years ago

|

history blame contribute delete

2.26 kB

	---
	license: mit
	pipeline_tag: audio-to-audio
	tags:
	- denoising
	- speech enhancement
	- speech separation
	- noise suppression
	- realtime
	---

	This is a pre-trained version of Fast FullSubNet, a real-time denoising model trained on the Deep Noise Suppression Challenge dataset of 2020 ([DNS-INTERSPEECH-2020](https://github.com/microsoft/DNS-Challenge/tree/interspeech2020/master)).

	## How to run

	https://fullsubnet.readthedocs.io/en/latest/usage/getting_started.html

	## Code

	https://github.com/Audio-WestlakeU/FullSubNet

	Note: The code doesn't support real-time streaming out of the box. See [issue-67](https://github.com/Audio-WestlakeU/FullSubNet/issues/67) for details.

	## Paper

	[Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement](https://arxiv.org/abs/2212.09019), Xiang Hao, Xiaofei Li

	> For many speech enhancement applications, a key feature is that system runs on a real-time, latency-sensitive, battery-powered platform, which strictly limits the algorithm latency and computational complexity. In this work, we propose a new architecture named Fast FullSubNet dedicated to accelerating the computation of FullSubNet. Specifically, Fast FullSubNet processes sub-band speech spectra in the mel-frequency domain by using cascaded linear-to-mel full-band, sub-band, and mel-to-linear full-band models such that frequencies involved in the sub-band computation are vastly reduced. After that, a down-sampling operation is proposed for the sub-band input sequence to further reduce the computational complexity along the time axis. Experimental results show that, compared to FullSubNet, Fast FullSubNet has only 13\% computational complexity and 16\% processing time, and achieves comparable or even better performance.

	## Performance

	\| \| With Reverb \| \| \| \| No Reverb \| \| \|
	-- \| -- \| -- \| -- \| -- \| -- \| -- \| --
	Method \| WB-PESQ \| NB-PESQ \| SI-SDR \| STOI \| WB-PESQ \| NB-PESQ \| SI-SDR \| STOI
	Fast FullSubNet (118 Epochs) \| 2.882 \| 3.42 \| 15.33 \| 0.9233 \| 2.694 \| 3.222 \| 16.34 \| 0.9571
	[FullSubNet (58 Epochs)](https://github.com/Audio-WestlakeU/FullSubNet/releases/tag/v0.2) (just for comparison) \| 2.987 \| 3.496 \| 15.756 \| 0.926 \| 2.889 \| 3.385 \| 17.635 \| 0.964