INVESTIGADORES
BAYA Ariel Emilio
artículos
Título:
DStab: estimating clustering quality by distance stability
Autor/es:
BAYÁ, ARIEL E.; LARESE, MÓNICA G.
Revista:
PATTERN ANALYSIS AND APPLICATIONS
Editorial:
SPRINGER
Referencias:
Año: 2023
ISSN:
1433-7541
Resumen:
Most commonly, stability analyses are performed using an external validation measure. For example, the Jaccard index is one of the indexes of choice for stability measurement. The index is wrapped around a resampling method to sense the model’s stability. Other methods use classifiers to look for stable partitions instead. In these cases, a resampling method is also used with an external index, an error measure driven by a classifier, and a clustering algorithm aiming to find stable clustering model configurations. Contrary to previous stability-based methods, we propose a novel validation procedure consisting of an internal validation index within a resampling strategy. We propose an index based on the distance between cluster centroids coupled with a twofold cross-validation resampling approach. Moreover, we use a threshold based on a null hypothesis to detect meaningful clustering partitions. As part of our experimental study, we have selected the K-means algorithm because of its simplicity but primarily for its instability compared to other algorithms, such as Hierarchical methods. Finally, we compare our approach with several known validation indexes and discuss the results. Our findings show that our method cannot only find meaningful clustering partitions but is also helpful as an unsupervised data analysis tool.