SINC(I)   25518
INSTITUTO DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Deep learning for reading and interpreting biomedical papers
Autor/es:
JONATAN RAAD; BUGNON, LEANDRO A.; MERINO, GABRIELA; MATÍAS GERARD; DIEGO H. MILONE; MILTON PIVIDORI; GEORGINA STEGMAYER; YONES, C.; MARIANO RUBIOLO; LEANDRO DI PERSIA
Lugar:
Mendoza
Reunión:
Congreso; A2B2C; 2019
Institución organizadora:
A2B2C
Resumen:
Background:Next-generation sequencing together with novel preclinical reports have led to an increasingly largeamount of results published in the scientific literature. However, due to the huge amount of papersavailable, identifying novel treatments or predicting a drug response in, for example, cancer patientsremains a laborious and challenging work. This task requires ?reading? a lot of documents foridentifying just a small set of papers that have the proper relations between input keywords. There isan urgent need for computational methods that can automatically do this task.Results:Our method based on deep learning (DL) is capable of analyzing and interpreting papers in order toautomatically extract relevant relations between specific biomedical keywords. It is an end-to-end DLmodel, trained with full documents and several keyword-pairs with binary labels, indicating whetherthere is or there is not a relation between them within each full text. Given the input keywords, themodel outputs a prediction score for each paper in the corpus, with the probability that the inputkeywords are related in the text. For training the DL model, a corpus of documents is vectorized usinga word embedding. In order to represent a complete text, all the word vectors are concatenated,together with a one-hot-encoding vector for the input entity-type information, which indicates towhich of the possible biomedical entities each word belongs (gene, mutation or drug). Theembeddings pass through convolutional layers, which compress the word embeddings, and then moreconvolutional layers grouped in identity blocks (residual and pooling layers) with ELU activations andbatch normalization layers. The DL model has been evaluated using a manually curated (labeled)corpus including biomedical entities in oncology, with more than 100 full papers. The results of a10-fold cross validation showed that our DL model has outperformed state-of the-art proposalsachieving average F1 over 90%. Furthermore, the reliability of the output list of papers was measured,revealing that 100% of the first two documents retrieved for a particular search contain relevantrelations. This means that our model can guarantee that the keywords relation can be effectively foundin the top-2 papers of the ranked list. Furthermore, our method is capable of highlighting, within eachpaper, the specific fragments that have the associations of the input keywords (see Figure).Conclusions:This proposal could be used forrapidly identifying relationships in fulltext documents between genes andtheir mutations, drug responses andtreatments in the context of a certaindisease. This can certainly be a usefuland valuable resource for theadvancement of the precisionmedicine field.