BECAS
POZZI Florencia Ileana
congresos y reuniones científicas
Título:
Prediction of gene silencing in Arabidopsis thaliana using decision trees and support vectors machines algorithms
Autor/es:
POZZI, FLORENCIA I.; FELITTI, SILVINA A.
Reunión:
Congreso; XI Congreso Argentino de Bioinformática y Biología Computacional; 2021
Resumen:
Background:Although most angiosperms require an endospermic balance number (EBN) for normalendosperm development, tetraploid Paspalum notatum is EBN- insensitive. A candidate gene (GG13) of Paspalum notatum associated with endosperm development in insensitive EBN crosses was analyzed by gene silencing in Arabidopsis thaliana. When comparing the silenced (S) and control (C) conditions, S condition evidenced less relative expression in RT-qPCR experiments and a more elongated shape (shape index 1.83 vs. 1.57) in phenotypic analyses. Length (L), width (W) and shape index (L / W) variables were evaluated. The objective of the work was 1- To predict, from 1896 phenotypic data, the classes: S and C, through the use of the Decision Trees (DT) and Support Vector Machines (SVM) algorithms. 2- Determine which variable has more weight in class separation (S and C), in order to optimize the number of variables to be evaluated in future experiments for the GG13 gene, optimizing work time and effort. Predictive models were evaluated with RStudioResults:In this work it was obtained that in the DT model the variable with the greatest weight to achievethe separation of classes S and C was L/W (50), while W and L would have the same importance orweight to predict (25). In addition to the above, it was possible to define that 63% of the data isabove the L/W value: 1,595 (value defined in primary splits). For this model the precision was 79%and the error 21%. The optimal value of the cp parameter was 0.01. For the SVM model, the total ofsupport vectors was 633. For this model the precision was 72% and the error 28%. The optimal value of the cost parameter was 0.001.Conclusions:The results of the analysis of the predictive models DT and SVM we can say that: precision ofboth between scarce and good in terms of their power of generalization. The predictive ability ofclass separation and generalizability of the DT model outperformed SVM. It was not possible tooptimize the number of variables to be evaluated in future experiments for the gene under study,because the variable that best separates it is L/W, which to be determined requires the values ofvariables: L and W (variables that showed the same weight).