ISISTAN   23985
INSTITUTO SUPERIOR DE INGENIERIA DEL SOFTWARE
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
SPEdu: A Toolbox for Processing Digitized Historical Documents
Autor/es:
FABIO ROCHA; GUILLERMO RODRÍGUEZ
Lugar:
Mexico
Reunión:
Conferencia; 19th Mexican International Conference on Artificial Intelligence, MICAI 2020; 2020
Institución organizadora:
Sociedad Mexicana de Inteligencia Artficial
Resumen:
Historical-educational documentary sources have gained considerable attention in educational contexts. However, some sources suffer from serious problems such as inadequate infrastructure, poor preservation, and lack of qualified personnel. In addition, a large part of documents is not digitilized, making research difficult. As a consequence, there is a need for transcription, digitalization, and cataloging sources of information for the analysis of large volumes of data. To deal with this issue, we present SPEdu, a tool to digitalize sources of information demanded by research on the History of Education. The workflow of SPEdu is divided into three steps. Firstly, SPEdu acquires images from an information source. Secondly, the tool preprocesses the images and extracts features from them. Finally, a supervised machine learning module was built to classify images between text and non-text. To assess the viability of SPEdu, we used the Official Gazette of the State of Sergipe. Regarding the third step, we evaluated the performance of classification algorithms, such as J48, Logistic Regression, Multi-layered Perceptron (MLP), Naive Bayes, Random Forest, and Random Tree. Results have revealed that Random Forest outperformed remaining techniques with an average rate of 95% of accuracy.