A human-centred information environment is one where humans share information. With the fact that spoken communication the most basic form of communication among humans and that advanced network telecommunication devices are unevenly distributed across the globe, our idea is to create an environment where individuals can freely and naturally transmit and receive information via speech signals. Although there are functional and technical standards for speech recognition, synthesis and dialogue, the use of these standards in reality is very little in the present society. While we know the impediments to improve the accuracy of speech recognition and the research that can resolve these problems, given proper effort, it is probably more important to consider other factors that prevent its use. Spoken dialogue systems lack interest from users.
The transmission and reception of information that allows one to have a feeling of live interaction is unique to spoken language systems and is one of the major aspects of speech-based interfaces, which cannot be achieved with a simple text-based process. To achieve this, it is necessary to consider the domains of facial expression, gesture, speech quality, timing, and similar elements of communication. With current spoken dialogue systems, flexibility is missing and the system appears to the user as being cut-and-dried because of various constraints that will be discussed later. In our research, we separate the spoken dialogue system into different components, which helped us unravel the factors that need to be present to make the system appealing to users; thus, this approach can spread speech technology across society (Diagram 1). However, appealing system qualities cannot be mechanically evaluated with ease. It requires an accumulation of the emotions and perceptions that humans possess. Our objective is to establish a system in which a user can easily create speech-based interactive content including numerous spoken dialogues that he/she can evaluate. On the basis of this content, the system will attempt to inductively solve and unlock the essence of interaction.
First, for a system to appeal to users, it is necessary to examine various advanced speech technologies. The applicants have sufficient technical background to conduct the necessary research and development. With regard to constructing an environment where the user proactively submits the spoken dialogue content, there are no case studies till date, and such an environment is yet to be defined. Our basic strategy is to create a “development loop of content generation” as shown in Diagram 1, and by clarifying the various factors to experimentally raise the loop gain above 1, we will develop an arrangement of technology that easily creates such situations. The development loop in Diagram 1 is an actualization of an information environment where the user submits information. In addition, since large amount of spoken dialogues are created, it becomes possible to research on this content data and create the next generation speech technology.