| Summary: | In this paper we investigate the use of observation weights and contextual time-frequency information for clustering-based blind source separation. Previous clustering-based approaches have successfully used clustering techniques to estimate time-frequency separationmasks; however, these approaches generally disregard the structured nature of speech signals. Motivated by the homogenous behaviour of speech signals, we propose to modify the established fuzzy cmeans algorithm to bias the clustering results in favor of cluster membership homogeneity within localized neighborhoods in the time-frequency space. This problem can be solved by using a two stage algorithm: firstly, the estimation of data weights to indicate the reliability of each data point, and secondly, the integration of local contextual information into the cluster update equations from neighboring time-frequency slots. The proposed algorithm is evaluated in a three-fold manner using simulated, real recordings and public benchmark data; notable improvement in source separation performance over previous clustering approaches was achieved.
|