CESIMAR - CENPAT   25625
CENTRO PARA EL ESTUDIO DE SISTEMAS MARINOS
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Improving the Quality of Biodiversity Data Through Semantic Web Standards
Autor/es:
LEWIS, M.; ZARATE, M.D.; DELRIEUX, C.; FILLOTRANI , P.
Reunión:
Conferencia; 10th International Conference on Ecological Informatics; 2018
Institución organizadora:
Friedrich Recknagel
Resumen:
The lack of accurate spatial geographic information in the occurrence data of species generates problems in many conservation activities, such as systematic planning for the protection of endangered species. In this abstract we describe the experiences made to improve the location quality of biodiversity data extracted from an Integrated Publishing Toolkit (IPT) http://ipt.cenpat-conicet.gob.ar:8081/ belonging to the Patagonian National Research Centre in Darwin Core Archive format (DwC-A) [1]. Our approach is based on previous work [2] where we published a set of biodiversity data using Resource Description Framework (RDF) [3], a standard model for data interchange on the Web. The main scientific questions to answer currently are: (1) how can we integrate biodiversity data from different sources using its geographic location? (2) how can we check if locations of a DwC file are consistent? (3) How can we correct the locations that are wrong?While there are tools to check the quality of biodiversity data, there is a gap to ensure the quality of the georeferenced data in a dataset. To answer these questions, we added more semantics to geographic locations using ontologies such as Linked GeoData http://linkedgeodata.org/About, GeoNames http://www.geonames.org/ and GeoSPARQL http://www.opengeospatial.org/standards/geosparql which allows us check on if a certain position (latitude and longitude) is located in the spatial coverage of the region described. For example, with a simple SPARQL query [4] we know if a position that claims to belong to an area/region is correct or not. Another important improvement is that the values of certain fields of DwC that were previously literal, are now replaced by references to URI, such as the field dwc:country containing the literal Argentina, was replaced by the URI http://sws.geonames.org/3865483/. In addition, the use of the GeoSPARQL standard allows complex semantic queries, an example of this is: find all non-native or invasive species that have occurrences within a certain region defined by the user. Although the quality of the data depends on many factors and previous controls, we believe that taking advantage of the Semantic Web [5] and in particular of GeoSPARQL can help to address these problems. However, widespread adoption and implementation remain a challenge. As future work, we intend to extend our current implementation with more advanced requests, in partnership with biodiversity researchers to develop, improve and test this tool of quality control for species location. We also aim to build a benchmark to assess the accuracy and recall of our queries.