INVESTIGADORES
YOHAI Victor Jaime
artículos
Título:
Robust and Sparse Estimators for Linear Regression Models
Autor/es:
EZEQUIEL SMUCLER; VÍCTOR J. YOHAI
Revista:
COMPUTATIONAL STATISTICS AND DATA ANALYSIS
Editorial:
ELSEVIER SCIENCE BV
Referencias:
Lugar: Amsterdam; Año: 2017 vol. 111 p. 116 - 130
ISSN:
0167-9473
Resumen:
In this paper, we consider the problem of robust and sparse estimation for linear regression models. In modern regression analysis, sparse and high-dimensional estimation scenarios where the ratio of the number of predictor variablesto the number of observations, say p/n, is high, but the number of actually relevant predictor variables to the number of observations, say s/n, is low, have become increasingly common in areas such as bioinformatics and chemometrics. Outlieridentification and robustness issues are difficult even when p is of moderate size. Traditional robust regression estimators do not produce sparse models and can have a bad behaviour with regard to robustness and efficiency when p/n is high,see Maronna and Yohai (2015) and Smucler and Yohai (2015). Moreover, they cannot be calculated for p > n. Thus, robust regression methods for high-dimensional data are in need. Modern approaches to estimation in sparse and high-dimensional linear regression models include penalized least squares (LS) estimators, e.g. the LS-Bridge estimator of Frank and Friedman (1993) and the LS-SCAD estimator of Fan andLi (2001). LS-Bridge estimators are penalized least squares estimators in which the penalization function is proportional to the qth power of the ℓq norm with q > 0. They include as special cases the LS-Lasso of Tibshirani (1996) (q = 1) andthe LS-Ridge of Hoerl and Kennard (1970) (q = 2). The LS-SCAD estimator is a penalized least squares estimator in which the penalization function, the smoothly clipped absolute deviation (SCAD), is a function with several interesting theoretical properties.The theoretical properties of penalized least squares estimators have been extensively studied in the past years. Of special note is the so called oracle property defined in Fan and Li (2001): An estimator is said to have the oracle property if the estimated coefficients corresponding to zero coefficients of the true regression parameters are set to zero with probabilitytending to one, while at the same time the coefficients corresponding to non-zero coefficients of the true regression parameter are estimated with the same asymptotic efficiency we would have if we knew the correct model in advance.