INVESTIGADORES
RODRIGUEZ Gustavo Ruben
congresos y reuniones científicas
Título:
New set of classes for fruit shape classification in tomato based on machine learning
Autor/es:
VAZQUEZ, DANA V.; SPETALE, FLAVIO E; TAPIA, ELIZABETH; RODRÍGUEZ, GUSTAVO RUBÉN
Lugar:
Rosario
Reunión:
Workshop; vo Congreso Argentino de Bioinformática y Biología Computacional, la 13va Conferencia Iberoamericana de Bioinformática y la 3ra reunión anual de la red RiaBio; 2023
Resumen:
Background: Tomato (Solanum lycopersicum L.) is the second most consumed global vegetable. Fruit shape significantly impacts on yield, quality, consumer preference, and commercial usage. Despite of the digital advancements in precision agriculture, the determination of fruit shape still relies predominantly on visual assessment, and there are no standardized approaches. Classification criteria often vary among experts, and exist for tomato four: "Rodriguez2011," "Visa2014," "UPOV," and "IPGRI". They define eight, nine, ten, and eight classes, respectively) and do not present consensus. This study aims to develop a machine-learning model for automated tomato shape classification and establish a “gold standard”.Results: Using the Solanaceae Genomic Network Repository, a total of 1424 longitudinal-sectioned tomato fruit images were examined, and 41 numerical variables were obtained from Tomato Analyzer software. The fruits were visually classified using the four known criteria. Additionally, a novel set of classes was introduced, merging the rectangular class from Rodriguez2011 method into the ellipsoid class. The data set was split for train (80%) and test (20%). Variables were standardized using Z-Score. Four highly correlated (>0.95) variables were removed. The key variables were identified for each method by Recursive Feature Elimination, ultimately keeping 12 ones that were common across all methods. The supervised classification methods employed were multinomial logistic regression, random forest, and support vector machine. The models did not show significant differences in mean accuracy (p>0.05). However, substantial differences were noted among the methods (p<0.01) for all models. Wilcoxon-Mann-Whitney test showed that mean accuracy for UPOV and IPGRI was not significantly different and exhibited the lowest values, Rodriguez2011 and Visa2014 showed no significant differences for accuracy and intermediate values, and the novel set of classes yielded the highest mean accuracy values across all four models, i.e., 85%. Conclusions: The results demonstrate the new set of classes enhances classification accuracy and is a "gold standard" for future shape studies. Finally, the novel method proposes seven shape classes: flat, round, ellipsoid, heart, oxheart, obovoid, and long, achieving 85% accuracy. This "gold standard" for fruit shape facilitates precise tomato cultivar description and consensus among researchers, aiding genetic understanding.