Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Current implementations of real-time speech-to-speech (S2S) translation systems for intercultural collaboration have mainly focused on the accuracy of the recognition and translated content. Typically, the translated utterance is presented to users through text-to-speech (TTS), without projecting cultural nuances in the tone of voice. This study investigates whether there are cross-cultural markers of variations in voice dynamics, and, if these have any impact on user satisfaction. Based on subjective user evaluations (Chinese and English), we conclude that there are salient cross-cultural voice markers relevant to the interaction of culture and system design; with noticeable impact on user satisfaction in TTS and S2S systems.
Michael Picheny, Zoltan Tuske, et al.
INTERSPEECH 2019
Wendy Ark, Ben Shaw, et al.
IWIC 2009
Francesco Cuomo, Eric Mibuari, et al.
ACM DEV 2013
Ea-Ee Jan, Hong-Kwang Kuo, et al.
INTERSPEECH 2009