INVESTIGADORES
DEL RIO Juan Pablo
congresos y reuniones científicas
Título:
A Bayesian Approach to Duplicate Detetcion From Real Estate Listings.
Autor/es:
DIOGUARDI, F.; ANTONELLI, L.; MARCOS, M.; DEL RÍO, J.P.; TORRES, D.
Lugar:
Popayán
Reunión:
Workshop; Decisioning 2023.; 2023
Resumen:
The availability of large amounts of real estate data on the internet presents a great opportunity for analysts and statisticians to derive insights. However, ensuring the quality of the data can be challenging due to the presence of duplicate listings. This study proposes a duplicate detection strategy for a real estate knowledge graph containing listings scraped from different web pages. The presented approach involves using Duke to discover the implicit owl:sameAs links between records, which achieved a precision of 66.8%, a recall of 70.4%, and an F-measure of 68.6%. It was found that segmenting the comparison of records according to the attributes compared and giving each segment a different weight on the matching result is a successful way of solving this problem. The strategy was evaluated through a ground truth dataset created by domain experts, which consisted of a real estate listing knowledge graph with duplicate and unique entities. This approach effectively cleaned the knowledge graph of noisy data and can be useful for making accurate statistical analyses in real estate domains. The suggested procedure can be used to create a real estate observatory that analyses real estate listings from different sources to generate useful statistics. Analysts can use this tool to extract valuable information and make data-driven decisions in the real estate market.