CONICET | Buscador de Institutos y Recursos Humanos

In this chapter, a review of different methods for Audio-Visual Speech Recognition (AVSR) using Random Forest, is presented. First, a strategy based on the combination of Wavelet multiresolution analysis and Random Forest is proposed. The temporal evolution of the input speech data is represented by a set of wavelet-based features. Then, a Random Forest classifier is employed to carry out the speech recognition task. Second, a novel scheme based on the combination in a cascade wise of two classifiers, Random Forest and Complementary Random Forest, is proposed. Different from Random Forest which is trained with instances of each particular class, the Complementary Random Forest is trained with instances of all the remaining classes. The performance of the proposed speech recognition methods is evaluated in different scenarios, namely, considering only acoustic information, only visual information (lip-reading), and fused audio-visual information. These evaluations are carried out over three different audio-visual databases, two of them public ones and the remaining one compiled by the authors of this chapter. Experimental results show that a good performance is achieved with the proposed methods over the three databases and for the different kinds of input information being considered.