ICC   25427
INSTITUTO DE INVESTIGACION EN CIENCIAS DE LA COMPUTACION
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Leveraging Probabilistic Existential Rules for Adversarial Deduplication
Autor/es:
MARIA VANINA MARTINEZ; GERARDO I. SIMARI; JOSÉ N. PAREDES; MARCELO A. FALAPPA
Lugar:
Oxford
Reunión:
Workshop; 2nd Workshop on Logics for Reasoning about Preferences, Uncertainty, and Vagueness (PRUV 2018); 2018
Resumen:
The entity resolution problem in traditional databases, alsoknown as deduplication, seeks to map multiple virtual objects to its cor-responding set of real-world entities. Though the problem is challenging,it can be tackled in a variety of ways by means of leveraging severalsimplifying assumptions, such as the fact that the multiple virtual ob-jects appear as the result of name or attribute ambiguity, clerical errorsin data entry or formatting, missing or changing values, or abbrevia-tions. However, in cyber security domains the entity resolution problemtakes on a whole different form, since malicious actors that operate incertain environments like hacker forums and markets are highly moti-vated to remain semi-anonymous?this is because, though they wish tokeep their true identities secret from law enforcement, they also havea reputation with their customers. The above simplifying assumptionscannot be made in this setting, and we therefore coin the term ?adver-sarial deduplication?. In this paper, we propose the use of probabilisticexistential rules (also known as Datalog+/?) to model knowledge engi-neering solutions to this problem; we show that tuple-generating depen-dencies can be used to generate probabilistic deduplication hypotheses,and equality-generating dependencies can later be applied to leverageexisting data towards grounding such hypotheses. The main advantagewith respect to existing deduplication tools is that our model operatesunder the open-world assumption, and thus is capable of modeling hy-potheses over unknown objects, which can later become known if newdata becomes available.