INVESTIGADORES
SPETALE Flavio Ezequiel
congresos y reuniones científicas
Título:
GO Deep: AI Annotation of lncRNAs
Autor/es:
GARCIA LABARI IGNACIO; SPETALE FLAVIO EZEQUIEL; IGLESIAS NATALIA; MURILLO JAVIER; ANGELONE LAURA; BULACIO PILAR; TAPIA, ELIZABETH
Reunión:
Workshop; Workshop of RiaBio Ibero-American Network on Artificial Intelligence applied to BioData; 2021
Resumen:
Background: Gene Ontology (GO) provides access to computable knowledge about genes and gene products. Most of the GO annotation tools developed during the past twenty years focus on protein coding genes known to encode their functionality on their primary sequence. More recently, the fundamental role of non-coding genes in the regulation of protein coding genes has been definitely established. Among non-coding genes, those encoding lncRNA products (> 200 nt) are particularly suitable for their in-silico annotation by Machine Learning methods. In a recent contribution, we showed that access to lncRNA secondary structure information enables their automatic GO annotation. We note, however, that to go deeper in the annotation of lncRNAs, an improved characterization - beyond that provided by naive kmers - of their secondary structure information is required. Here, we present preliminary results on a novel GO annotation method for lncRNAs where deep learning overcomes the need for expert characterization of primary and secondary structure information.Results: We built upon the pipeline described in Spetale et al. 2021 where a hierarchical distributed approach for the supervised GO annotation of lncRNAs was presented. Briefly, the GO graph induces a set of binary SVM predictors of individual GO terms. These predictors provide raw, likely inconsistent, GO annotations for query lncRNA sequences. An instance of the belief propagation algorithm for graphs with cycles, a workaround solution for distributed reasoning in artificial intelligence, leverages raw GO annotations taking into account GO relationships among GO terms and the confidence of raw GO annotations. Individual SVM predictors are trained with lncRNA data from a curated repository. The training process requires the expert, non trivial, characterization of primary sequence data and associated models for candidate secondary structures. Here,we overcome such a need for expert characterization by means of deep learning. At each level of the GO graph, a multiclass CNN is introduced to provide raw GO annotations for the corresponding GO terms. From top to bottom of the GO, the process is repeated until scarcely populated GO terms emerge (less than 500 annotated lncRNAs). At this point, individual GO term predictions are obtained from SVM predictors. Experimental results on GO subsets for zebrafish and human lncRNAs confirm the power of deep learning to accurately predict general GO terms at the top GO levels, a feature that boosts annotations at deeper GO levels.Conclusions: The introduction of deep learning processing at the top GO levels removes spurious GO lncRNAs annotations introduced by SVM counterparts. As a result, we can go deeper in the annotation of lncRNAs avoiding the need to analyze confusing GO annotation branches at the posterior expert visualization analysis.