BECAS
LOPEZ Sabrina Laura
congresos y reuniones científicas
Título:
Deidentification of Spanish healthcare free-text: not fully reliable but far better than nothing!
Autor/es:
LOPEZ SABRINA LAURA; LUCIANO SILVI; LAURA ALONSO ALEMANY; LAURA ACIÓN
Lugar:
Montevideo
Reunión:
Encuentro; Khipu 2023: Latin American Meeting In Artificial Intelligence; 2023
Institución organizadora:
Facultad de Ingeniería, Universidad de la República
Resumen:
In Argentina, Electronic Health Records (EHR) have been continuously implemented increasing the amount of this type of data. Ethical considerations arise for their reuse to address secondary research, public health, management, and policy-making questions. Health data are sensitive data according to national and international regulations (HIPAA, GDPR, etc.) because they can significantly impact people’s lives. Thus, having tools for effectively eliminating protected personal information (PPI) that could allow patient identification is a must. But anonymization of free text in EHR is a challenging problem because it is full of peculiarities (words outside common vocabulary, ambiguity, etc.). We are presenting our experience in developing a de-identification algorithm for free text in the EHR of a province in Argentina. We found that it is not clear to humans what information is PPI during a manual anonymization task. As expected, automatic processes also miss cases of PPI, even more than humans do. However, a simple, rule-based approach can do a good job in removing most of the PPI, outperforming a more sophisticated, machine-learning approach for low-resources contexts. Although no process can guarantee anonymization, our method can mitigate the impact of possible data breaches from highly sensitive information.