INVESTIGADORES
HERNANDEZ LAHME Damian Gabriel
artículos
Título:
Information Approach to Co-occurrence of Words in Written Language
Autor/es:
D. G. HERNÁNDEZ
Revista:
COMPLEX SYSTEMS
Editorial:
Complex Systems Publications, Inc.
Referencias:
Lugar: Champaign, IL; Año: 2015 vol. 24
ISSN:
0891-2513
Resumen:
In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences.