INVESTIGADORES
DI PERSIA Leandro Ezequiel
congresos y reuniones científicas
Título:
Signal processing on graphs to measure similarity between gene annotations in the Gene Ontology
Autor/es:
TIAGO LOPEZ; LEANDRO EZEQUIEL DI PERSIA; DIEGO HUMBERTO MILONE
Reunión:
Congreso; XI Congreso argentino de Bioinformatica y Biologia Computacional; 2021
Resumen:
Background:The semantic similarity measures based on ontologies are useful in many applications, such as theinference of functions of genes or proteins annotated in the Gene Ontology. This application couldimprove the process of gene annotation, significantly reducing the number of laboratory experiments required. The classical measures for this application are based on the frequency of annotation of terms, but lack certain basic properties when considered as distances. As this could reduce the performance of an inference algorithm we propose to define new measures that incorporate more directly the structure of the Gene Ontology graph, applying the theory of signal processing on graphs.Results:Genes are represented as signals in a graph, defining paths that travel from the root node to the annotated terms. The proposed measures consist of defining dictionaries to transform the gene annotations and get the euclidean distances in this projected space. The dictionaries contain all the paths from the root node to the leaves, all the paths to each leaf combined or the eigenvectors of the graph Laplacian, that is, the Fourier Graph Transform atoms. These measures are evaluated by comparing the distances between genes annotated in a GO sub-ontology, and by the performance in the prediction of gene functions with a Bayesian approach. The distance for genes in a sub-ontology gave the expected results for the proposed measures. In the prediction task, the F1 score was higher for the proposed measures than the classical measures. Particularly, the measure based on the Graph Fourier Transform gave the higher performance scores for the automatic function prediction.Conclusions:The results show that the proposed measures adjust better to the notions of semantic similarity between genes, and are consistent with the mathematical properties of a distance. In the inference of gene functions, the proposed measures proved to be an appropriate alternative, even improving the performance of the classical measures.