CIEM   05476
CENTRO DE INVESTIGACION Y ESTUDIOS DE MATEMATICA
Unidad Ejecutora - UE
capítulos de libros
Título:
Multiclass Classification of Tree Structured Objects: The K-NN Case
Autor/es:
ANA G FLESIA
Libro:
BIOMAT 2012
Editorial:
World Sci. Publ.
Referencias:
Lugar: Hackensack, NJ; Año: 2013; p. 360 - 380
Resumen:
In this paper, we consider the problem of supervised classification of tree structured objects. Being the tree structured population included in a metric space, we define a k-nearest neighbors (k-nn) procedure and we argue about its reliability by showing its statistical consistency. We assess finite sample classification errors within two different sets of trees. First, an example from proteomics. We define a k-nn classification procedure based on Variable Length Markov Chain Modeling of primary sequence protein families from Pfam database, and compare its performance with the standard Hidden Markov Chain approach, with competitive classification errors . Secondly an example from phylogenetics. We define a k-nn classification procedure on binary labeled trees to study the differences introduced by several phylogenetic tree building methods and the bootstrap on flu virus data. Low classification errors imply significant differences between trees.