166 阅读 2023-10-24 09:33:23 上传
Spoken communication is inherently multimodal. The acoustic signal carries the auditory modality and the image carries the visual and gestural modalities (facial deformation). The speech signal is in fact the consequence of the deformation of the vocal tract under the effect of the movement of the jaw, lips, tongue, soft palate and larynx to modulate the excitation signal produced by the vocal cords or air turbulence. These deformations are visible on the face (lips, cheeks, jaw) through the coordination of different orofacial muscles and skin deformation induced by the latter. The visual modality can provide additional information to the acoustic signal, and it becomes essential if the acoustic signal is degraded, as is the case with hard of hearing or in a noisy environment. Other modalities may be related to speech, such as eyebrow movements and gestures that express different emotions. This latter modality that is suprasegmental can complete the acoustic or acoustic-visual message.
In this document, I present my main research during the last 10 years. Figure 1 gives an overview of my research activities. I consider speech as a multimodal object that can be studied from an articulatory, or acoustic, or visual standpoint. These different modalities can be combined and investigated together or two-by-two. The common point of these different aspects is that it is based on the data, and thus acquiring and processing data is a central point in my research. I am interested also in investigating the production of this multimodal object and its synthesis, but also on its perception by human perceiver in a face-to-face communication.