Association control in mobile wireless networks
Minkyong Kim, Zhen Liu, et al.
INFOCOM 2008
This paper discusses recent work in synthesizing a breathy quality in pre-recorded speech, which has applications in voice morphing and concatenative TTS. Previous work has shown that the breathy quality in speech is characterized in part by the presence of random noise in the upper region of the spectrum [1]. The sinusoidal modelling representation of speech facilitates making high-quality modifications to speech signals as well as modifying regions of the spectrum independently. We use sinusoidal modelling, along with techniques borrowed from analog communication systems to simulate aspiration noise in wideband speech signals above some lower cutoff frequency. Specifically, we use techniques based on amplitude modulation (AM) and phase modulation (PM), with the harmonics from the sinusoidal model of speech as carriers and lowpass random noise as the message signal. Formal listening tests were conducted and listeners rated the synthesized effect as "breathy" more often than in natural non-breathy speech, but significantly less often than in naturally breathy speech.
Minkyong Kim, Zhen Liu, et al.
INFOCOM 2008
Daniel M. Bikel, Vittorio Castelli
ACL 2008
Nanda Kambhatla
ACL 2004
Sameer Maskey, Bowen Zhou, et al.
ICSLP 2006