INVESTIGADORES
REY VEGA Leonardo Javier
artículos
Título:
PACMAN: PAC-style bounds accounting for the mismatch between accuracy and negative log-loss
Autor/es:
MATÍAS VERA; LEONARDO REY VEGA; PABLO PIANTANIDA
Revista:
Information and Inference: A Journal of the IMA
Editorial:
Oxford University Press
Referencias:
Año: 2024 vol. 13
ISSN:
2049-8772
Resumen:
The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) using a testing dataset. Whereas, these algorithms are optimized through the minimization of a typically different---more convenient---loss function using a training set. For classification tasks, this loss function is often the negative log-loss which yields the well-known cross-entropy risk that is typically better behaved (in terms of numerical behavior) than the zero-one loss. Conventional studies on the generalization error do not usually take into account the underlying mismatch between losses at training and testing phases. In this work, we introduce a theoretical analysis based on a pointwise PAC approach over the generalization gap considering the mismatch of testing on the accuracy metric and training on the negative log-loss, referred to as PACMAN. Building on the fact that the resulting mismatch can be written as a likelihood ratio, concentration inequalities can be used to obtain insights into the generalization gap in terms of PAC bounds, which depend on some meaningful information-theoretic quantities. An analysis of the obtained bounds and a comparison with available results in the literature is also provided.