INVESTIGADORES
DI PERSIA Leandro Ezequiel
congresos y reuniones científicas
Título:
Deep learning for reading and interpreting biomedical papers
Autor/es:
BUGNON, LEANDRO A; YONES, CRISTIAN; RAAD, JONATHAN; GERARD, MATIAS; RUBIOLO, MARIANO; MERINO, GABRIELA; PIVIDORI, MILTON; DI PERSIA, LEANDRO E; MILONE, DIEGO H; STEGMAYER, GEORGINA
Lugar:
Mendoza
Reunión:
Congreso; 10mo Congreso Argentino de Bioinformática y Biología Computacional; 2019
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
Our method based on deep learning (DL) is capable of analyzing and interpreting papers in order to automatically extract relevant relations between specific biomedical keywords. It is an end-to-end DL model, trained with full documents and several keyword-pairs with binary labels, indicating whether there is or there is not a relation between them within each full text. Given the input keywords, the model outputs a prediction score for each paper in the corpus, with the probability that the input keywords are related in the text. For training the DL model, a corpus of documents is vectorized using a word embedding. In order to represent a complete text, all the word vectors are concatenated,together with a one-hot-encoding vector for the input entity-type information, which indicates to which of the possible biomedical entities each word belongs (gene, mutation or drug). The embeddings pass through convolutional layers, which compress the word embeddings, and then moreconvolutional layers grouped in identity blocks (residual and pooling layers) with ELU activations and batch normalization layers. The DL model has been evaluated using a manually curated (labeled) corpus including biomedical entities in oncology, with more than 100 full papers. The results of a 10-fold cross validation showed that our DL model has outperformed state-of the-art proposals achieving average F1 over 90%. Furthermore, the reliability of the output list of papers was measured,revealing that 100% of the first two documents retrieved for a particular search contain relevant relations. This means that our model can guarantee that the keywords relation can be effectively found in the top-2 papers of the ranked list. Furthermore, our method is capable of highlighting, within each paper, the specific fragments that have the associations of the input keywords.