INVESTIGADORES
FERNANDEZ SLEZAK Diego
congresos y reuniones científicas
Título:
Evaluation of LSA performance in Spanish using multiple corpus of text,
Autor/es:
CARRILLO, FACUNDO; CECCHI, GUILLERMO; SIGMAN, MARIANO; FERNÁNDEZ SLEZAK, DIEGO
Lugar:
Cordoba
Reunión:
Congreso; JAIIO 2013; 2013
Resumen:
Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The suc- cess of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dy- namically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance. The methodology proposed was capable of de- tecting polysemy and opens the possibility of dynamically increase train- ing corpus of the method without the necessity of complete recalculation