Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
This paper proposes a new text-to-speech synthesis technique, for producing continuous, natural sounding speech of a specific speaker. The synthesis technique is based on selecting short speech frames from a phoneme-labeled speech database. The selection procedure involves minimization of a distortion criterion, by a dynamic programming algorithm. The proposed scheme is more flexible than many existing schemes using fixed speech segments, such as diphones. It results in a more natural synthesized speech. An efficient speech representation is used to express simply and accurately the spectral continuity of speech. A further improvement in the database search mechanism and in database size was obtained by sectioning the speech phonemes into "steady-states"and "transitions". The resulting synthesized speech quality, is satisfactory and indeed preserves the natural voice of the speaker.
Jehanzeb Mirza, Leonid Karlinsky, et al.
NeurIPS 2023
Hagen Soltau, Lidia Mangu, et al.
ASRU 2011
Diganta Misra, Muawiz Chaudhary, et al.
CVPRW 2024
Hans-Werner Fink, Heinz Schmid, et al.
Journal of the Optical Society of America A: Optics and Image Science, and Vision