CIFASIS   20631
CENTRO INTERNACIONAL FRANCO ARGENTINO DE CIENCIAS DE LA INFORMACION Y DE SISTEMAS
Unidad Ejecutora - UE
artículos
Título:
Robust front-end for Audio, Visual and Audio-Visual Speech Classification
Autor/es:
TERISSI, LUCAS DANIEL; SAD, GONZALO; GÓMEZ, JUAN CARLOS
Revista:
International Journal of Speech Technology
Editorial:
Springer
Referencias:
Lugar: Berlin; Año: 2018 p. 1 - 15
ISSN:
1381-2416
Resumen:
This paper proposes a robust front-end for speech classification which can be employed with acoustic, visual or audio?visual information, indistinctly. Wavelet multiresolution analysis is employed to represent temporal input data associatedwith speech information. These wavelet-based features are then used as inputs to a Random Forest classifier to performthe speech classification. The performance of the proposed speech classification scheme is evaluated in different scenarios,namely, considering only acoustic information, only visual information (lip-reading), and fused audio?visual information.These evaluations are carried out over three different audio?visual databases, two of them public ones and the remainingone compiled by the authors of this paper. Experimental results show that a good performance is achieved with the proposedsystem over the three databases and for the different kinds of input information being considered. In addition, the proposedmethod performs better than other reported methods in the literature over the same two public databases. All the experimentswere implemented using the same configuration parameters. These results also indicate that the proposed method performssatisfactorily, neither requiring the tuning of the wavelet decomposition parameters nor of the Random Forests classifierparameters, for each particular database and input modalities.