CIEM   05476
CENTRO DE INVESTIGACION Y ESTUDIOS DE MATEMATICA
Unidad Ejecutora - UE
capítulos de libros
Título:
Unsupervised classification of tree structured objects.
Autor/es:
FLESIA, A.G.
Libro:
Biomat 2008: International Symposium on Mathematical and Computational Biology.
Editorial:
World Scientific Publishing Co.
Referencias:
Lugar: New York; Año: 2009; p. 266 - 280
Resumen:
Recent developments in medical image analysis, phylogenetics and proteomics  motivatethe statistical analysis of populations of tree-structured data objects. In this context, unsupervised classification of trees arises as a challenging new area that depends onthe careful development of  novel mathematical framework. We will illustrate this point  through the study of three different metric spaces of trees, each of which adequate for a different application. The discussion will center on  statistical aspects of clusteringin a framework where the tree data to be clustered has been sampled from some unknown probability distribution. Following Luxburg et al (2005), we will try to verify two conditions:  appropriateness, the clustering ofthe data set should reveal some structure of the underlying data rather than model artifacts due to the random sampling process; and steadiness, themore sample points we have, the more reliable the clustering should be. We will argue about steadiness and reliability  by  showing an extension of the convergence properties for a class of non-parametric clustering algorithm:  K-means, defined on different metric spaces of trees. We will explore the appropriateness of the clustering outputs of K-means and several linkage  methods on a real data set from proteomics, and we will comment the results from Stockham et al (2002) on three real data sets of phylogenetic trees.