248 阅读 2023-10-13 16:01:48 上传
We discuss an identifification framework for noisy speech mixtures. A
block-based generative model is formulated that explicitly incorporates
the time-varying harmonic plus noise (H+N) model for a number of latent
sources observed through noisy convolutive mixtures. All parameters
including the pitches of the source signals, the amplitudes and phases of
the sources, the mixing fifilters and the noise statistics are estimated by
maximum likelihood, using an EM-algorithm. Exact averaging over the
hidden sources is obtained using the Kalman smoother. We show that
pitch estimation and source separation can be performed simultaneously.
The pitch estimates are compared to laryngograph (EGG) measurements.
Artifificial and real room mixtures are used to demonstrate the viability
of the approach. Intelligible speech signals are re-synthesized from the
estimated H+N models