CONICET | Buscador de Institutos y Recursos Humanos

This project involved the design and development of a relational SQL-based database to generate an intonational model for an Argentine Spanish text to speech system. The first stage in the population of the database involved the massive loading of text, divided into three co-indexed files: sentences, orthographic words and phonological syllables. A software tool, which performed phonemic transcription and syllabic segmentation of the text, was developed to allow indexation. In the beginning, a large set of sentences was loaded, then, a subset of 741 sentences was selected, according to criteria related to syllable occurrences in all positions in a word with and without stress. This set contained 97% of all Spanish syllables extracted from a widely used Spanish dictionary. The utterances were recorded at 16 kHz / 16 bits, using an interactive program. Two professional announcers were instructed to generate a variety of accent patterns and intonational phrases to prevent monotony. Speech signals were then labeled by trained phoneticians with a spectral analysis tool, using ToBI tiers (Beckman & Ayers, 1994). ToBI conventions were reviewed to account for the prosodic patterns of Argentine Spanish. Frequency values were scaled using the ERB scale, and bitonal accents were redefined. Finally, the second stage of the population of the database consisted of the incorporation of the labeled files. Waveforms were kept outside the database, but linked to it, to allow the identification, reproduction, and selection of specific segments. Frequency values were scaled using the ERB scale, and bitonal accents were redefined. Finally, the second stage of the population of the database consisted of the incorporation of the labeled files. Waveforms were kept outside the database, but linked to it, to allow the identification, reproduction, and selection of specific segments.

enviar mensaje