INVESTIGADORES
SOTO Axel Juan
congresos y reuniones científicas
Título:
Argo as a platform for integrating distinct biodiversity analytics tools into workflows for building graph databases
Autor/es:
RIZA BATISTA-NAVARRO; NHUNG T.H. NGUYEN; AXEL J. SOTO; WILLIAM ULATE; SOPHIA ANANIADOU
Lugar:
Ottawa
Reunión:
Simposio; TDWG 2017 Symposium on Semantics for Biodiversity Science; 2017
Resumen:
Together with the increasingly growing amount of available data on biodiversity comes the proliferation of various informatics tools aimed at the collection, management and analysis of biodiversity-relevant knowledge. Consequently, we have seen how several data formats and programming languages or environments have come into use, giving rise to a problem in interoperability should anyone wish to combine the outputs of distinct tools, or to integrate them into one solution.Argo (Rak et al. 2012), an online text mining workbench based on the Unstructured Information Management Architecture (UIMA) interoperability standard, offers a means for seamlessly unifying various tools and resources into customisable text processing workflows. Among many other features, Argo provides: (1) a library of diverse tools, i.e., UIMA components, each of which is dedicated to a specific task such as loading datasets or gazetteers of interest (e.g., the Biodiversity Term Inventory), recognition of species names and their semantically related terms (Nguyen et al. 2017); (2) a graphical interface for designing workflows using components as building blocks; (3) an environment for executing and monitoring the progress of workflows; and (4) a user-interactive annotation editor for manually revising or validating results of automated processing.Recently, Argo has been extended to provide support for incorporating into workflows external web services conforming with the Representational State Transfer (REST) protocol. Taking advantage of these features, we demonstrate how we combine in-house tools and resources for named entity recognition (Batista-Navarro et al. 2017) with externally developed ones, e.g., EXTRACT (Pafilis et al. 2016), in order to build text mining workflows for populating neo4j graph databases with biodiversity-relevant knowledge. To provide a few exemplars, we focus on use cases that seek to leverage various sources of literature to capture fine-grained information on the habitat and reproductive conditions of: (1) a subset of plants catalogued in World Flora Online (Jackson and Miller 2015), and (2) tropical trees belonging to the Dipterocarpaceae family.