Seyed Omid Sadjadi, Jason W. Pelecanos, et al.
INTERSPEECH 2014
Gaussian mixture models (GMM) have become one of the standard acoustic approaches for Language Detection. These models are typically incorporated to produce a log-likelihood ratio (LLR) verification statistic. In this framework, the intersession variability within each language becomes an adverse factor degrading the accuracy. To address this problem, we formulate the LLR as a function of the GMM parameters concatenated into normalized mean supervectors, and estimate the distribution of each language in this (high dimensional) supervector space. The goal is to de-emphasize the directions with the largest intersession variability. We compare this method with two other popular intersession variability compensation methods known as Nuisance Attribute Projection (NAP) and Within-Class Covariance Normalization (WCCN). Experiments on the NIST LRE 2003 and NIST LRE 2005 speech corpora show that the presented technique reduces the error by 50% relative to the baseline, and performs competitively with the NAP and WCCN approaches. Fusion results with a phonotactic component are also presented. ©2008 IEEE.
Seyed Omid Sadjadi, Jason W. Pelecanos, et al.
INTERSPEECH 2014
Jiří Navrátil, David Klusáček
ICASSP 2007
John R. Hershey, Peder A. Olsen
ICASSP 2008
Qing Wang, Da Fan, et al.
ICASSP 2008