Yongxin Li, Hakan Erdogan, et al.
ICSLP 2002
In a multimodal conversation, user inputs are usually abbreviated or imprecise. Only fusing inputs together is inadequate in reaching a full understanding. To address this problem, we have developed a context-based approach for multimodal interpretation. In particular, we present three operations: ordering, covering, and aggregation. Using feature structures that represent intention and attention identified from user inputs and the overall conversation, these operations provide a mechanism to combine multimodal fusion and context-based inference. These operations allow our system to process a variety of user multimodal inputs including those incomplete and ambiguous ones.
Yongxin Li, Hakan Erdogan, et al.
ICSLP 2002
George Saon, Juan M. Huerta
ICSLP 2002
Wael Hamza, Robert Donovan
ICSLP 2002
Jiří Navrátil, Ganesh N. Ramaswamy
ICSLP 2002