INVESTIGADORES
BAYA Ariel Emilio
congresos y reuniones científicas
Título:
Clustering gene expression data with a penalized graph-based metric
Autor/es:
ARIEL BAYA; P. M. GRANITTO
Lugar:
Cordoba
Reunión:
Congreso; 2do Congreso Argentino de Bioinformática y Biología Computacional; 2011
Institución organizadora:
Asociación Argentina de Bioinformática y Biología Computacional
Resumen:
The search for cluster structure in microarray datasets is a base problem for the so-called ``-omic sciences´´. A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. To address this problem we present a method that follows ISOMAP [1] locality concept. Our method creates as a first step the k-nearest neighbor graph (knn-graph) of the data. If the graph is disconnected, which is expected in clustering problems, we add a number of edges in order to create a connected graph. The key point of our method is that the added edges have a highly penalized length. We then apply an appropriate algorithm to measure inter-point distances along the connected graph and use these measures as (dis)similarities. We call the method the PKNNG metric (for Penalized K-Nearest Neighbor Graph based metric). The PKNNG metric can be applied to any base measure of similarity (Euclidean, Pearson´s correlation, Manhattan, etc.) and the resulting distances can be clustered with any method that takes as input a distance matrix. We test our method against other clustering algorithms using 8 publicly available microarrays dataset.