CONICET | Buscador de Institutos y Recursos Humanos

Conferencia; 4to. Congreso Argentino de Bioinformática y Biología Computacional (4CAB2C) y 4ta. Conferencia Internacional de la Sociedad Iberoamericana de Bioinformática (SolBio); 2013

Institución organizadora:

CIFASIS-Conicet-UNR, Asociación Argentina de Bioinformática y Biología Computacional (A2B2C) - Sociedad Iberoamericana de Bioinformática (SoIBio),

Resumen:

Gene annotation is an important problem in bioinformatics research. Possible gene functions and relationships between them can be described by Gene Ontology (GO). GO provides a controlled vocabulary of terms across three branches, Cellular Component (CC), Molecular Function (MF) and Biological Process (BP). Gene annotation aims the association between biological data and GO concepts, here called GO terms. Gene annotation can be performed experimentally using the EXP GO evidence code (Inferred from Experiment) to tag biological knowledge evidence. Alternatively, to narrow down candidate gene annotations for further experimental work, gene annotation can be performed electronically using the IEA GO evidence code (Inferred from Electronic Annotation).Current IEA annotations are mostly performed by BLAST similarity searches. But in many cases, e.g., for non-model organisms, BLAST similarity scores may be too weak. To overcome this problem, we consider the design of machine learning methods for reliable IEA gene annotations. Without lack of generality, we focus on the prediction of GO BP classes. For this purpose, IEA gene annotation predictions are modeled as a hierarchical multilabel classication problem. Under this baseline, we consider the True Path Rule (TPR) method for predicting BP class nodes. Briefly, TPR carries out two steps. At rst, predictions are made at each node of the BP ontology graph using a set of binary classiers. Secondly, the BP ontology graph is scanned in a bottom-up way and consensus predictions are made for each node taking into account former binary predictions and evidence from children nodes. As a result of this propagation strategy, a ne balance between precision and recall of gene annotations is obtained. It should be noted, however, that TPR predictions may suer from a starting problem, i.e., predictions may signicantly dier depending on the selection of the starting node at each level of the ontology graph