IFEG   20353
INSTITUTO DE FISICA ENRIQUE GAVIOLA
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
A comparison of hierarchical community detection algorithms
Autor/es:
YANG, ZHAO; JUAN I. PEROTTI; CLAUDIO J. TESSONE
Lugar:
Lyon
Reunión:
Conferencia; The 6 th International Conference on Complex Networks and Their Applications; 2017
Institución organizadora:
University of Lyon
Resumen:
A comparison of hierarchical community detection algorithmsZhao Yang 1 , Juan I. Perotti 2,3,4 , and Claudio J. Tessone 1,21 URPP Social Networks, University of Zurich, Andreasstrasse 15, CH-8050 Zürich, Switzerland2 IMT School for Advanced Studies Lucca, Piazza San Francesco 19, I-55100 Lucca, Italy3 Instituto de Física Enrique Gaviola (IFEG-CONICET), Ciudad Universitaria, 5000 Córdoba, Argentina3 Facultad de Matemática, Astronomía, Física y Computación, Universidad Nacional de Córdoba, ArgentinaIn this study, for the first time we perform an extensive analysis of state-of-the-art hierarchical community detection algorithms. Three hierarchical community detection algorithms have been chosen, which are: Infomap [1], a recursive application of Louvain method for the generation of hierarchies [2] [3] and the Minimum Description Length implementation of the Hierarchical Stochastic Block Model (HSBM)[4]. To do so, we introduce a novel benchmark graph with ground truth which preserves several properties of real-world networks with hierarchical community structure. It is able to generate networks that have (i) an arbitrary number of hierarchical levels; (ii) a power-law degree distribution; (iii) a power-law community size distribution; and (iv) the smallest or at least reasonable sizes possible. We argue that most of the existing hierarchical benchmark models can not generate graphs with power-law community size distribution. To quantify the performance of the above mentioned algorithms, we resort the recently introduced hierarchical mutual information metric [3]. This allows us quantification quality of the algorithms in recovering the hierarchical community structure. We also demonstrate that these benchmark graphs exhibit a variety of topological transitions between co-existing ground truths. In order to construct the benchmark graph (termed RB-LFR) we start from a standard non-hierarchical LFR benchmark network, which we consider as the seed network motif for an adapted Ravasz-Barabási procedure for constructing hierarchical networks [5][6]. We have used the latter because it has clear defined hierarchical structure and it can be extended to the multiple levels. The networks produce two different well-defined ground truths, and the mixing parameter controls a topological transition between them. Taking the two-level RB-LFR benchmark graphs as an example, when the mixing parameter of the seed LFR benchmark is small, its community structure and that of its replicas are well-defined. On the first level, the RB-LFR benchmark displays as many communities as the seed LFR has, i.e. C communities. Each community in this first layer contains one community of the seed LFR together with all its replicas. At the second level, each community of the first one contains R + 1 sub-communities --- one for each replica plus the seed one --- summing a total of C × (R + 1) sub-communities in the complete network. When the mixing parameter μ is increased, the community structure of the seed LFR becomes more fuzzy and harder to detect. This also happens to the seed and all replicas communities with the RB-LFR benchmark. This happens while the number of inter-layer links remain the same regardless of μ. Therefore, the seed LFR and the replicas may be interpreted as R + 1 communities at the first layer. Each of them has as many sub-communities at the second level as the seed LFR had, i.e. C. Again, the total number of sub-communities at the second level is (R + 1) × C but, this time, such a number is reached through different means. If the mixing parameter of the seed LFR becomes too large, then the communities become impossible to detect and the community structure of the RB-LFR benchmark network has a single level. The results of our tests are shown in Fig. 1 (see attached .PDF). In the left panels the accuracy of the different community detection algorithms are quantified by the average value of the NHMI (i.e. Normalized Hierarchical Mutual Information) computed between the detected hierarchical community structures and the different ground truths. In the center column, the similarity is quantified with the average NMI (i.e. Normalized Mutual Information) computed between the detected partitions at the second level and those exhibited by the different ground truths. In the right panels, the similarity is quantified by the difference HMI - MI between the HMI computed for the full hierarchies and the MI computed for the partitions at the first level. The tested methods are Infomap, Louvain, and HSBM from top to bottom. Taking the top-left panel as an example: Infomap can unveil the community structure until μ ≈ 0.6. For μ < 0.1, it detects the first type of ground truth, and for 0.2 < μ < 0.6, it detects the second type of ground truth. We observe a clear transition between the ground truths; in both regions, the NHMI reaches values close to 1 making apparent that the algorithm gives a description of the hierarchy very close to the ground truth. For large μ, Infomap detects a flat community structure. This result showcases that the RB-LFR benchmark shows a clear hierarchical community structure which can be recognized successfully by Infomap. Comparing panels (a) to (d), and (g) of Fig. 1, we observe that the new benchmark poses a challenging task that can test the performance of the algorithms, and that the capabilities of the different algorithms depend non-trivially on the network topology. We note here that the poor performance of the HSBM is most likely related to its approach, i.e. a bottom-up approach, while the other two methods are taking the top-down approaches to build the hierarchies. Finally, the difference HMI - MI is shown in Fig. 1c, f, & i. It gives the contribution that the second level has on the HMI. In other words, it quantifies how accurately the algorithms detect the second level and how relevant is the corresponding contribution as measured by the HMI. It is clear that such contribution is non-negligible, showing the convenience of HMI as a measure for the comparison of hierarchical community structures, when compared to the traditional MI. To conclude, we have found that the newly introduced RB-LFR benchmark graphs pose challenging tests to state-of-the-art hierarchical community detection algorithms. More specifically, the tests on the two-level RB-LFR benchmark graphs indicate that Infomap outperforms the other two methods in terms of accuracy. Our benchmark graphs, while parsimonious, exhibit a rich phenomenology including a variety of topological transitions between co-existing ground truths. Additionally, our tests have also validated that the recently introduced Hierarchical Mutual Information (HMI) suits better for the comparison of hierarchical partitions than the traditional Mutual Information (MI) does.References1. Rosvall, M., Bergstrom, C. T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105, 1118?1123 (2008)2. Blondel, V. D., Guillaume, J. L., Lambiotte, R., Lefebvre, E. : Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008, P10008 (2008)3. Perotti, J. I., Tessone, C. J., Caldarelli, G.: Hierarchical mutual information for the comparison of hierarchical community structures in complex networks. Phys. Rev. E 92, 062825 (2015)4. Peixoto, T. P.: Hierarchical block structures and high-resolution model selection in large networks. Phys. Rev. X 4, 011047 (2014)5. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046101 (2008)6. Ravasz, E., Barabási, A. L.: Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003)