INVESTIGADORES
BAYA Ariel Emilio
artículos
Título:
Clustering gene expression data with a penalized graph-based metric
Autor/es:
A. E. BAYÁ; P. M. GRANITTO
Revista:
BMC BIOINFORMATICS
Editorial:
BIOMED CENTRAL LTD
Referencias:
Año: 2011 vol. 12 p. 2 - 44
ISSN:
1471-2105
Resumen:
Background: The search for cluster structure in microarray datasets is a base problem for the so-called ``-omic sciences´´. A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. Results: In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the sub-graphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions: In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric boost the performance of the most well-known and practical clustering methods, for example k-means and hierarchical clustering, to a level equivalent to more advanced algorithms, but keeping the easy-of-use and interpretation of the simple methods.