CIFASIS   20631
CENTRO INTERNACIONAL FRANCO ARGENTINO DE CIENCIAS DE LA INFORMACION Y DE SISTEMAS
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Clustering gene expression data with a penalized graph-based metric
Autor/es:
A. E. BAYÁ; P. M. GRANITTO
Lugar:
Cordoba
Reunión:
Congreso; 2do Congreso Argentino de Bioinformática y Biología Computacional; 2011
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
The search for cluster structure in microarray datasets is a base problem for the so-called ``-omic sciences''. A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets.\\ To address this problem we present a method that follows ISOMAP [1] locality concept. Our method creates as a first step the k-nearest neighbor graph (knn-graph) of the data. If the graph is disconnected, which is expected in clustering problems, we add a number of edges in order to create a connected graph. The key point of our method is that the added edges have a highly penalized length. We then apply an appropriate algorithm to measure inter-point distances along the connected graph and use these measures as (dis)similarities. We call the method the PKNNG metric (for Penalized K-Nearest Neighbor Graph based metric). The PKNNG metric can be applied to any base measure of similarity (Euclidean, Pearson's correlation, Manhattan, etc.) and the resulting distances can be clustered with any method that takes as input a distance matrix. We test our method against other clustering algorithms using 8 publicly available microarrays dataset.