Online speaker diarization using adapted i-vector transforms
Weizhong Zhu, Jason Pelecanos
ICASSP 2016
This paper presents ongoing research leveraging forensic methods for automatic speaker recognition. Some of the methods forensic scientists employ include identifying speaker distinctive audio segments and comparing these segments using features such as pitch, formant, and other information. Other approaches have also involved performing a phonetic analysis to recognize idiolectal attributes, and an implicit analysis of the demographics of speakers. Inspired by these forensic phonetic approaches, we target three threads of work; hot-spot analysis, speaker style and pronunciation modelling, and demographics analysis. As a result of this work we show that a phonetic analysis conditioned on select speech events (or hot-spots) can outperform a phonetic analysis performed over all speech without conditioning. In the area of pronunciation modelling, one set of results demonstrate significantly improved robustness by exploiting phonetic structure in an automatic speech recognition system. For demographics analysis, we present state-of-the-art results of systems capable of detecting dialect, non-nativeness and native language. © 2011 IEEE.
Weizhong Zhu, Jason Pelecanos
ICASSP 2016
Mohamed Kamal Omar, Lidia Mangu
ICASSP 2007
Steven Rennie, Pierre Dognin, et al.
ICASSP 2011
Seyed Omid Sadjadi, Sriram Ganapathy, et al.
Odyssey 2016