Transient modeling for overlap-add sinusoidal model of speech
Slava Shechtman
ICASSP 2013
In multi-form segment synthesis, output speech is constructed by splicing waveform segments with statistically modeled and regenerated parametric speech segments. The fraction of model-derived segments is called model-template ratio. The motivation of this work is to further increase flexibility of multi-form synthesis maintaining high speech quality for high model-template ratios. An approach is presented where the representation type of a segment is selected per acoustic leaf. We introduce a novel method for leaf representation selection based on a psychoacoustic segment stationarity score. Additionally, refinements in multi-form segment concatenation including boundary constrained statistical parametric synthesis and time-domain alignment based on multi-peak analysis of cross-correlation for high modeltemplate ratio multi-form synthesis are presented.
Slava Shechtman
ICASSP 2013
Tamar Shoham, David Malah, et al.
IEEE Transactions on Audio, Speech and Language Processing
Alexandra König, Aharon Satt, et al.
Current Alzheimer Research
Asaf Rendel, Alexander Sorin, et al.
ICASSP 2012