Contextual revision in information seeking conversation systems
Keith Houck
ICSLP 2004
In this paper, methods for reconciling pronunciation differences between a rule-based front-end and the pronunciations observed in a database of recorded speech are presented. The methods are applied to the IBM Expressive Speech Synthesis System [1] for both unrestricted and limited-domain text-to-speech synthesis. One method is based on constructing a multiple pronunciation lattice for the given sentence and scoring it using word and phoneme n-gram statistics computed from the target speaker's database. A second method consists of storing observed pronunciations and introducing them as alternates in the search. We compare the strengths and weaknesses of these two methods. Results show that improvements are achieved in both limited and unrestricted domains, with the largest gains coming in the limited-domain case.
Keith Houck
ICSLP 2004
Sabine Deligne, Ellen Eide, et al.
INTERSPEECH - Eurospeech 2001
Bhuvana Ramabhadran, Olivier Siohan, et al.
ICSLP 2004
Wael Hamza, Raimo Bakis, et al.
INTERSPEECH - Eurospeech 2005