CONICET | Buscador de Institutos y Recursos Humanos

Introduction of exotic maize (Zea mays L.) into breeding programs may enhance genetic variability and lead to greater progress from selection. However, the pool of available exotic germplasm is large and diverse, making choices of potential parents difficult. Two major heterotic group-classification methods are currently used widely across the world. The traditional method uses specific combining ability with some line-pedigree information and/or field hybrid-yield information to assign a maize line to a heterotic group (Hallauer and Miranda, Quantitative Genetics in Maize Breeding, 2nd ed. Iowa State Univ. Press, Ames, IA, 1988). Another method employs various molecular markers to compute genetic similarity (GS) or genetic distance (GD) in order to assign maize lines to different heterotic groups (Mohammadi and Prasanna, Crop Sci. 43:12351248, 2003). However, the results of these associations are still inconsistent (dos Santos Dias et al., Genet. Mol. Res. 3: 356-368, 2004). Machine learning is an emerging scientific discipline concerned with the design and development of algorithms that allow computers to change behavior based on data, such as from sensor data or databases (Witten and Eibe, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann, San Francisco, 2005). In particular, supervised learning algorithms allows for deducing a function from training data. The training data consist of pairs of input objects (typically vectors of features) and desired outputs i.e. the class (Witten and Eibe, 2005). We conjecture that traditional distance-based methods currently available do not capture the non-linear relationship between parental molecular data and progeny performance and that such hindrance can be overcome by supervised learning algorithms. Among them, support vector machines (SVMs) have shown high generalization abilities and have become very popular in the last few years (Rifkin and Klautau, JMLR 5:101-114, 2004). However, they are binary classifiers and a combination scheme is necessary to extend SVMs for problems with more two classes (Rifkin and Klautau, JMLR 5:101-114, 2004). In this work we explore the performance of the recently introduced class of ECOC-SVM (Error Correcting Output Coding-Support Vector Machine) classifiers, based on recursive error correcting codes of communication theory (Tapia et al., LNCS 3541:108117, 2005), in heterotic group assignation. As a control we used four (4) Native multiclass classifiers: Naive Bayes (John and Langley, 11th Conf. on Uncertainty in A, 338345, 1995), Bayes Network (Friedman et al., Mach Learn. 29:131163, 1997), Decision Tree J48 (Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA., 1993) and Logistic Model Trees or just Simple Logistic (Landwehr et al., Mach Learn. 161205, 2005). We also report the performance of the ensemble method using as a base classifier Naive Bayes and J48 (Witten and Eibe, 2005).