ICYTE   26279
INSTITUTO DE INVESTIGACIONES CIENTIFICAS Y TECNOLOGICAS EN ELECTRONICA
Unidad Ejecutora - UE
capítulos de libros
Título:
Wavelet Descriptors for Handwritten Text Recognition in Historical Documents
Autor/es:
BYRON LEITE DANTAS BEZERRA; LETICIA MARIA SEIJAS
Libro:
Handwriting: Recognition, Development and Analysis
Editorial:
Nova Science Publishers, Inc.
Referencias:
Lugar: Nueva York; Año: 2017; p. 1 - 265
Resumen:
The automatic transcription of historical handwritten documents is a complex task because of the strong variability of writing styles, different font types and sizes of characters, underlined and/or crossed-out words. The use of segmentation-free (holistic) approaches which tightly integrate an optical character model and a language model, has yielded the best performance on standard benchmarks. In these approaches the preprocessed line image is segmented into frames using a sliding window to extract features from each slice. In this Chapter a different approach for feature extraction based on the application of the CDF 9/7 Wavelet Transform (WT) is proposed in order to represent the content of each slice. The WT is a technique particularly suited for locating spatial and frequential information in image processing and specifically for multiresolution feature extraction from patterns to be classified. Experiments were performed on the data set and partitions used in the ICFHR 2014 HTRtS competition, extracted from the challenging Bentham collection. Our proposal outperformed HTR percentages reported in the literature for N-gram/HMM-GMM baseline systems. Additionally, data representation was improved as a result of reducing the feature vector size by more than 70 %. Reduction of descriptor size had a decisive impact on training and processing times of large databases.