IQUIFIB   02644
INSTITUTO DE QUIMICA Y FISICOQUIMICA BIOLOGICAS "PROF. ALEJANDRO C. PALADINI"
Unidad Ejecutora - UE
artículos
Título:
Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information (Ms ID BIOINF 2008-1628)
Autor/es:
C MARINO-BUSLJE; J SANTOS; JM DELFINO; M NIELSEN
Revista:
BIOINFORMATICS (OXFORD, ENGLAND)
Referencias:
Año: 2008
ISSN:
1367-4803
Resumen:
Motivation: Mutual information (MI) theory is often applied to
predict positional correlations in a multiple sequence alignment
(MSA) to make possible the analysis of those positions structurally
or functionally important in a given fold or protein family. Accurate
identification of coevolving positions in protein sequences is difficult
due to the high background signal imposed by phylogeny and noise.
Several methods have been proposed using MI to identify coevolving
amino acids in protein families.
Results: After evaluating two current methods, we demonstrate
how the use of sequence-weighting techniques to reduce sequence
redundancy and low-count corrections to account for small number
of observations in limited size sequence families, can significantly
improve the predictability of MI. The evaluation is made on large
sets of both in silico-generated alignments as well as on biological
sequence data. The methods included in the analysis are the
APC (average product correction) and RCW (rowcolumn weighting)
methods. The best performing method was APC including sequenceweighting
and low-count corrections. The use of sequencepermutations
to calculate a MI rescaling is shown to significantly
improve the prediction accuracy and allows for direct comparison of
information values across protein families. Finally, we demonstrate
how a lower bound of 400 sequences <62% identical is needed
in an MSA in order to achieve meaningful predictive performances.
With our contribution, we achieve a noteworthy improvement
on the current procedures to determine coevolution and residue
contacts, and we believe that this will have potential impacts on the
understanding of protein structure, function and folding.