Gang Wang, Fei Wang, et al.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A traditional framework in speech production describes the output speech as an interaction between a source excitation and a vocal-tract configured by the speaker to impart segmental characteristics. In general, this simplification has led to approaches where systems that focus on phonetic segment tasks (e.g. speech recognition) make use of a front-end that extracts features that aim to distinguish between different vocal-tract configurations. The excitation signal, on the other hand, has received more attention for speaker-characterization tasks. In this work we look at augmenting the front-end in a recognition system with vocal-source features, motivated by our work with languages that are low in resources and whose phonology and phonetics suggest the need for complementary approaches to classical ASR features. We demonstrate that the additional use of such features provides improvements over a state-of-the-art system for low-resource languages from the BABEL Program.
Gang Wang, Fei Wang, et al.
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Atsuyoshi Nakamura, Naoki Abe
Electronic Commerce Research
Kun Wang, Juwei Shi, et al.
PACT 2011
Benny Kimelfeld, Yehoshua Sagiv
ICDT 2013