ICC   25427
INSTITUTO DE INVESTIGACION EN CIENCIAS DE LA COMPUTACION
Unidad Ejecutora - UE
capítulos de libros
Título:
Robust Features in Deep-Learning-Based Speech Recognition
Autor/es:
ABEER ALWAN; MARTIN GRACIARENA; RICHARD STERN; DIMITRA VERGYRI; LUCIANA FERRER; HORACIO FRANCO; JOHN HANSEN; WEN WANG; JULIEN VAN HOUT; VIKRAMJIT MITRA
Libro:
New Era for Robust Speech Recognition
Editorial:
Springer
Referencias:
Año: 2017; p. 187 - 217
Resumen:
Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Networks (DNNs) becoming the new state of the art for  acoustic  modeling.  DNNs  offer  significantly  lower  speech  recognition  error rates compared to those provided by the previously used Gaussian Mixture Models (GMMs). Unfortunately, DNNs are data sensitive, and unseen data conditions can deteriorate their performance. Acoustic distortionssuch as noise, reverberation, channel  differences,  etc.add  variation  to  the  speech  signal,  which  in  turn  impact DNN acoustic model performance. A straightforward solution to this issue is training the DNN models with these types of variation, which typically provides quite impressive  performance.  However,  anticipating  such  variation  is  not  always  possible; in these cases, DNN recognition performance can deteriorate quite sharply. To avoid subjecting acoustic models to such variation, robust features have traditionally been used to create an invariant representation of the acoustic space. Most commonly, robust feature-extraction strategies have explored three principal areas: (a) enhancing the speech signal, with a goal of improving the perceptual quality of speech; (b) reducing the distortion footprint, with signal-theoretic techniques used to learn the distortion characteristics and subsequently filter them out of the speech signal; and finally (c) leveraging knowledge from auditory neuroscience and psychoacoustics, by using robust features inspired by auditory perception. In  this  chapter,  we  present  prominent  robust  feature-extraction  strategies  explored by the speech recognition research community, and we discuss their relevance to coping with data-mismatch problems in DNN-based acoustic modeling. We present results demonstrating the efficacy of robust features in the new paradigm of DNN acoustic models. And we discuss future directions in feature design for making speech recognition systems more robust to unseen acoustic conditions. Note that the approaches discussed in this chapter focus primarily on single channel data.