ICC   25427
INSTITUTO DE INVESTIGACION EN CIENCIAS DE LA COMPUTACION
Unidad Ejecutora - UE
artículos
Título:
A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool
Autor/es:
VIDAL, JAZMÍN; BE?U?, STEFAN; BRUSCO, PABLO; GRAVANO, AGUSTÍN
Revista:
SPEECH COMMUNICATION
Editorial:
ELSEVIER SCIENCE BV
Referencias:
Año: 2020 vol. 125 p. 24 - 40
ISSN:
0167-6393
Resumen:
In dialogue, speakers produce and perceive acoustic/prosodic turn-taking cues, which are fundamental for negotiating turn exchanges with their interlocutors. However, little of the temporal dynamics and cross-linguistic validity of these cues is known. In this work, we explore a set of acoustic/prosodic cues preceding three turn-transition types (hold, switch and backchannel) in three different languages (Slovak, American English and Argentine Spanish). For this, we use and refine a set of machine learning techniques that enable a finer-grained temporal analysis of such cues, as well as a comparison of their relative explanatory power. Our results suggest that the three languages, despite belonging to distinct linguistic families, share the general usage of a handful of acoustic/prosodic features to signal turn transitions. We conclude that exploiting features such as speech rate, final-word lengthening, the pitch track over the final 200 ms, the intensity track over the final 1000 ms, and noise-to-harmonics ratio (a voice-quality feature) might prove useful for further improving the accuracy of the turn-taking modules found in modern spoken dialogue systems.