CIFASIS   20631
CENTRO INTERNACIONAL FRANCO ARGENTINO DE CIENCIAS DE LA INFORMACION Y DE SISTEMAS
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Improved Graph-Based Metrics for Clustering High-Dimensional Datasets
Autor/es:
A. E. BAYÁ; P. M. GRANITTO
Lugar:
Bahia Blanca, Argentina
Reunión:
Congreso; IBERAMIA 2010 - LNAI 6433; 2010
Resumen:
Clustering is one of the most used tools for data analysis. Unfortunately, most methods suffer from a lack of performance when dealing with high dimensional spaces. Recently, some works showed evidence that the use of graph-based metrics can moderate this problem. In particular, the Penalized K-Nearest Neighbour Graph metric (PKNNG) showed good results in several situations. In this work we propose two improvements to this metric that makes it suitable for application to very different domains. First, we introduce an appropriate way to manage outliers, a typical problem in graph-based metrics. Then, we propose a simple method to select an optimal value of K, the number of neighbours considered in the k-nn graph. We analyze the proposed modifications using both artificial and real data, finding strong evidence that supports our improvements. Then we compare our new method to other graph based metrics, showing that it achieves a good performance on high  dimensional datasets coming from very different domains, including DNA microarrays and face and digits image recognition problems.