INVESTIGADORES
GURLEKIAN Jorge Alberto
congresos y reuniones científicas
Título:
Acoustic unit segmentation for text to speech systems.
Autor/es:
GURLEKIAN J.A., TORRES H.
Lugar:
Toronto
Reunión:
Conferencia; LASP 2006; 2006
Institución organizadora:
Univ. de Toronto
Resumen:
Automatic acoustic unit segmentation for text to speech systems Synthesis by concatenation of natural speech produces improved perceptual results when phonemes and syllables are extracted from running speech, at places where spectral variations are small. Following this concept, a new  automatic segmentation method (Torres et al, 2005) based on a combination of entropy coding, multi-resolution analysis, and Kohonen self organized maps, is applied to two Argentine Spanish databases. During segmentation, there are no limits imposed by any linguistic unit, so resulting waveforms segments represent phone chains essentially dominated by a spectral dynamic structure. Each unit could be composed of a variable number of phones or a segmented part of them at the boundaries. Both number and composition of phones are speaker dependent, i.e.: rate, segmental and suprasegmental distinctive features affect them. Despite the fact that the results obtained from the two -male and female- databases of 741 sentences each, show this dependence, both speakers show a high occurrence of three,  and four phone sequences. Vowel-Consonant-Vowel sequences are the most frequent type. Consonant-Vowel syllables, which are phonemically frequent in Spanish, have less appearances using this method. The relevance of half  phone segmentation is verified as 65% of the total units start and end with a segmented phone. Perceptual experiments showed that concatenated speech with  acoustic units, were judged more natural than for diphone units.