IBR   13079
INSTITUTO DE BIOLOGIA MOLECULAR Y CELULAR DE ROSARIO
Unidad Ejecutora - UE
artículos
Título:
The influence relevance voter: an accurate and interpretable virtual high throughput screening method
Autor/es:
SWAMIDASS, S.J., AZENCOTT, S.C., GRAMAJO, H., LIN, T.W., TSAI, S.C., AND BALDI, P.F.
Revista:
Journal of Chemical Informatics and Modeling
Editorial:
ACS PUBLICATIONS
Referencias:
Año: 2009 p. 256 - 263
ISSN:
1549-9596
Resumen:
Given activity training data from high-throughput screening (HTS) experiments, virtual high-throughput
screening (vHTS) methods aim to predict in silico the activity of untested chemicals. We present a novel
method, the Influence Relevance Voter (IRV), specifically tailored for the vHTS task. The IRV is a lowparameter
neural network which refines a k-nearest neighbor classifier by nonlinearly combining the influences
of a chemicals neighbors in the training set. Influences are decomposed, also nonlinearly, into a relevance
component and a vote component. The IRV is benchmarked using the data and rules of two large, open,
competitions, and its performance compared to the performance of other participating methods, as well as
of an in-house support vector machine (SVM) method. On these benchmark data sets, IRV achieves stateof-
the-art results, comparable to the SVM in one case, and significantly better than the SVM in the other,
retrieving three times as many actives in the top 1% of its prediction-sorted list. The IRV presents several
other important advantages over SVMs and other methods: (1) the output predictions have a probabilistic
semantic; (2) the underlying inferences are interpretable; (3) the training time is very short, on the order of
minutes even for very large data sets; (4) the risk of overfitting is minimal, due to the small number of free
parameters; and (5) additional information can easily be incorporated into the IRV architecture. Combined
with its performance, these qualities make the IRV particularly well suited for vHTS.k-nearest neighbor classifier by nonlinearly combining the influences
of a chemicals neighbors in the training set. Influences are decomposed, also nonlinearly, into a relevance
component and a vote component. The IRV is benchmarked using the data and rules of two large, open,
competitions, and its performance compared to the performance of other participating methods, as well as
of an in-house support vector machine (SVM) method. On these benchmark data sets, IRV achieves stateof-
the-art results, comparable to the SVM in one case, and significantly better than the SVM in the other,
retrieving three times as many actives in the top 1% of its prediction-sorted list. The IRV presents several
other important advantages over SVMs and other methods: (1) the output predictions have a probabilistic
semantic; (2) the underlying inferences are interpretable; (3) the training time is very short, on the order of
minutes even for very large data sets; (4) the risk of overfitting is minimal, due to the small number of free
parameters; and (5) additional information can easily be incorporated into the IRV architecture. Combined
with its performance, these qualities make the IRV particularly well suited for vHTS.