INVESTIGADORES
GRAVANO Agustin
congresos y reuniones científicas
Título:
Restoring punctuation and capitalization in transcribed speech
Autor/es:
AGUSTÍN GRAVANO; MARTIN JANSCHE; MICHIEL BACCHIANI
Lugar:
Taipei, Taiwan
Reunión:
Congreso; 34th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009); 2009
Institución organizadora:
IEEE
Resumen:
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n=3 to n=6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much.