CONICET | Buscador de Institutos y Recursos Humanos

INVESTIGADORES

DI PERSIA Leandro Ezequiel

datos académicos

artículos

libros

capítulos de libros

congresos y reuniones científicas

convenios, asesorías y/o servicios tecnológicos

congresos y reuniones científicas

Título:

Deep learning for reading and interpreting biomedical papers

Autor/es:

BUGNON, LEANDRO A; YONES, CRISTIAN; RAAD, JONATHAN; GERARD, MATIAS; RUBIOLO, MARIANO; MERINO, GABRIELA; PIVIDORI, MILTON; DI PERSIA, LEANDRO E; MILONE, DIEGO H; STEGMAYER, GEORGINA

Lugar:

Mendoza

Reunión:

Congreso; 10mo Congreso Argentino de Bioinformática y Biología Computacional; 2019

Institución organizadora:

Asociación Argentina de Bioinformática y Biología Computacional

Resumen:

Our method based on deep learning (DL) is capable of analyzing and interpreting papers in order to automatically extract relevant relations between specific biomedical keywords. It is an end-to-end DL model, trained with full documents and several keyword-pairs with binary labels, indicating whether there is or there is not a relation between them within each full text. Given the input keywords, the model outputs a prediction score for each paper in the corpus, with the probability that the input keywords are related in the text. For training the DL model, a corpus of documents is vectorized using a word embedding. In order to represent a complete text, all the word vectors are concatenated,together with a one-hot-encoding vector for the input entity-type information, which indicates to which of the possible biomedical entities each word belongs (gene, mutation or drug). The embeddings pass through convolutional layers, which compress the word embeddings, and then moreconvolutional layers grouped in identity blocks (residual and pooling layers) with ELU activations and batch normalization layers. The DL model has been evaluated using a manually curated (labeled) corpus including biomedical entities in oncology, with more than 100 full papers. The results of a 10-fold cross validation showed that our DL model has outperformed state-of the-art proposals achieving average F1 over 90%. Furthermore, the reliability of the output list of papers was measured,revealing that 100% of the first two documents retrieved for a particular search contain relevant relations. This means that our model can guarantee that the keywords relation can be effectively found in the top-2 papers of the ranked list. Furthermore, our method is capable of highlighting, within each paper, the specific fragments that have the associations of the input keywords.

enviar mensaje