Pavel Kisilev, Daniel Freedman, et al.
ICPR 2012
Gender-dependent systems are usually created by splitting the training data into each gender and building two separate acoustic models for each gender. This method assumes that every state of a sub-phonetic model is uniformly dependent on the gender. In this work we use the premise that the acoustic realizations of various sub-phonetic units are dependent on gender in varying degrees across phones and more particularly context dependent. We show that this is indeed the case by using gender as a question in addition to phone context questions in the context decision trees. Using these trees we build phone-specific gender-dependent acoustic models and demonstrate a novel method to pick between genders during decoding based on a measure of confidence of the decoded hypothesis. An improvement of 6.3% in word error is achieved relative to a Gender-independent system.
Pavel Kisilev, Daniel Freedman, et al.
ICPR 2012
Michelle X. Zhou, Fei Wang, et al.
ICMEW 2013
Sudeep Sarkar, Kim L. Boyer
Computer Vision and Image Understanding
James E. Gentile, Nalini Ratha, et al.
BTAS 2009