![]() |
|||||
MIT EECS Event
|
|||||
![]() |
Speaking Out "Cloud": How Cloud Computing and Mashups are Fostering Multimodal Mobile Services . . . Abstract . . . Biography Giuseppe Di Fabbrizio, AT&T Labs - Research 11:00 AM (refreshments 10:45), 32-G882 (Stata Center - 8th Floor Reading Room) CSAIL Seminar - Host: Jim Glass, MIT CSAIL - Contact: Marcia Davidson, 617-253-3049, marcia@csail.mit.edu |
Speech is becoming a more attractive interface for mobile devices since it can overcome the input limitations of these mobile devices and it is safer for multitasking users. Plus speech is a direct, intuitive interface that requires no learning. And with the proliferation of web content - from business searches, mapping services, and game applications - it makes sense to combine, or mash up, speech interfaces with web services.
However, small devices have limited computational capabilities to perform speech processing tasks including automatic speech recognition and text-to-speech conversion that are required for speech interfaces, especially when large vocabularies or high quality synthesis are involved. One popular solution is to move the speech processing resources into the network by concentrating the heavy computation load in server farms. Some successful services exploit this approach, but to date these services perform a single specific task and it is unclear how easily these services can expand to perform other tasks, nor is it known whether they can scale to accommodate large deployments.
To address these challenges, we introduce the AT&T speech mashup architecture, a novel approach that leverages web services and cloud computing to make it easier to combine web content with a speech interface. We show that this new compositional method is suitable for integrating automatic speech recognition, text-to-speech synthesis, natural language understanding and multimodal understanding technologies into real multimodal mobile services. The generality of this method allows researchers and speech practitioners to explore a countless variety of mobile multimodal services with a finer grain of control and richer multimedia interfaces. Moreover, we demonstrate that the speech mashup is scalable and reduces network latency for better user experience.