IIBBA   05544
INSTITUTO DE INVESTIGACIONES BIOQUIMICAS DE BUENOS AIRES
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
Gaps matter! Could protein multiple sequence alignment gaps predict protein contacts?
Autor/es:
DIEGO JAVIER ZEA; CRISTINA MARINO BUSLJE
Lugar:
Bahía Blanca
Reunión:
Congreso; VI Congreso Argentino de Bioinformatica y Biología Computacional; 2015
Institución organizadora:
Asociacion Argentina de Bioinformática y Biología Computacional
Resumen:
Mutual Information (MI) is used to measure covariation between residues in a Multiple Sequence Alignment (MSA) and often to predict residue contacts in the 3D structure. In Buslje et. al [1] the Z-score is calculated as the number of standard deviations that the observed MI value falls above the mean value obtained from a set of 100 randomized MSAs. The best performance (in terms of AUC for contact prediction) was achieved when randomizing the MSAs sequence-based, instead of column based. That means that permutation of the residues was performed within each sequence while keeping the gaps in place. Buslje et. al. discussed that the sequence-based Z-score tests the hypothesis that the sequences are not homologous rather than testing the hypothesis that the sequences are homologous and correctly aligned, but that the columns are not correlated, that would be the appropriate null hypothesis. The authors discuss that the sequence-based Z-score hence should be interpreted only as an additional prediction score.In this work we show that the number of gaps between two positions of the MSA correlates with the sequence-based Z-score, and so the first has information on residue contacts. This gives insights on the importance of the signal that gaps may add in the MSA (for example in the prediction of protein contacts) and  warn users on the importance of thinking in the best way to treat them.