LOYOLA Juan Martin
congresos y reuniones científicas
UNSL at eRisk 2021: A comparison of three early alert policies for early risk detection
Conferencia; Conference and Labs of the Evaluation Forum (CLEF-WN 2021); 2021
Early risk detection (ERD) can be considered as a multi-objective problem in which the challenge is to find an adequate trade-off between two different and related aspects: 1) the accuracy in identifying risky users and, 2) the minimum time that a risky user detection requires to be reliable. The first aspect is usually addressed as a typical classification problem and evaluated with standard classification metrics like precision, recall, and F1. The second one involves a policy to decide when the information from a user classified as risky is enough to raise an alarm/alert and usually is evaluated by penalizing the delay in making that decision. In fact, temporal evaluation metrics used in ERD like ERDE𝜃 and 𝐹latency combine both aspects in different ways. In that context, unlike our previous participations at eRisk Labs, we focus this year on the second aspect in ERD tasks, that is to say, the early alert policies that decide if a user classified as risky should effectively be reported as such. In this paper, we describe three different early alert policies that our research group from the Universidad Nacional de San Luis (UNSL) used at the CLEF eRisk 2021 Lab. Those policies were evaluated on the two ERD tasks proposed for this year: early risk detection of pathological gambling and early risk detection of self-harm. The first approach uses standard classification models to identify risky users and a simple (manual) rule-based early alert policy. The second approach is a deep learning model trained end-to-end that simultaneously learns to identify risky users and the early alert policy through a Reinforcement Learning approach. Finally, the last approach consists of a simple and interpretable model that identifies risky users, integrated with a global early alert policy. That policy, based on the (global) estimated risk level for all processed users, decides which users should be reported as risky. Regarding the achieved results, our models obtained the best performance in terms of decision-based performance metrics (𝐹1, ERDE50, 𝐹latency) as well as in terms of the ranking-based performance measures, for both tasks. Furthermore, in terms of the 𝐹latency measure, the performance obtained in the first task was twice as good as the second-best team.