INVESTIGADORES
ALESSO Carlos Agustin
congresos y reuniones científicas
Título:
Calibration of Canopeo App to estimate pasture biomass: comparing linear regression vs random forest and MARS models
Autor/es:
JAUREGUI, J.J.; ALESSO, C.A.
Reunión:
Congreso; 45º Congreso Argentino de Producción Animal; 2022
Institución organizadora:
Asociación Argentina de Producción Animal
Resumen:
IntroductionAccurate determining pasture biomass is crucial to guarantee a proper feed budgeting on beef and dairy farms, and can have a profound effect on the profitability of pasture-based livestock systems. Non-destructive methods to determine biomass are the preferred choice for farmers, because many measurements can be made in little time. Canopeo App has previously shown to be highly accurate in determining lucerne (Medicago sativa L.) and winter crops’ biomass (Jáuregui et al., 2019), as well as mixed pastures (Jáuregui et al., 2021). However, all previous work to calibrate Canopeo was performed using linear regressions (LR). Although LR is well known for its simplicity, the presence of nonlinear patterns in the data might lead to potentially biased predictions (Hastie et al. 2008). In this abstract, we compared this traditional regression model with two non-parametric more complex models: random forest (RF) and multivariate adaptive regression splines (MARS). Our hypothesis was that these more flexible but complex models would learn from non-linear patterns resulting in more accurate predictions than traditional regression models. Materials and Methods Between July 2019 and January 2022, 973 pasture biomass samples from grazing beef and dairy farms located in Santa Fe and Córdoba provinces were collected. Pasture types corresponded to pure lucerne (Medicago sativa L.), legume-based (one or more legumes dominated the mixture) and grass-based pastures (one or more grasses dominated the mixture). Green Canopy cover (GCC) values were retrieved using Canopeo App by taking one image per sample prior to cutting using a cell phone. The cell phone was held ~1 m above the canopy and pictures were taken at a straight angle between 11 am and 2 pm. Crop biomass (BM) was determined by cutting to ground level a 0.5m2 sample. Samples were forced-oven dried (65°C) until constant weight. Both BM and GCC were visually explored for spotting potential outliers as a whole and by species and season. Data points were split into training and testing sets, the first one to fit/train the models and the latter to assess model performance, in a 1:3 ratio. At the same time, the training set was randomly split into 10 folds which were used to tune model parameters by k-fold cross-validation. In all cases, data splitting was done using four breaks of the response variable as strata. LR, RF and MARS models were built including GCC, species and season as predictors. For LR models, interactions and quadratic effects were included. Once models were trained, prediction performance was assessed computing the following metrics: Cohen’s correlation coefficient (CCC), coefficient of determination (R2), root mean squared error (RMSE), and mean prediction error (MPE). Data processing, visualization and modelling were performed using R statistical language and the following packages: tidyverse and tidymodels. Results and Discussion All models explained around 60-65% of total variability of BM with slight differences in accuracy. The most accurate model was MARS with a RMSE ~520 kg/ha. However, a more interpretable model like LR including quadratic effects had comparable results in terms of CCC and MPE (Table 1). Table 1. Statistical analysis of the data for each ear. Asterisks indicate significant differences between treatments. Model RMSE R2 CCC MPE MARS 520 0.63 0.77 -16 LR + int + poly 525 0.63 0.76 -19 LR + poly 543 0.60 0.74 -20 LR + int 549 0.59 0.74 -14 RF 555 0.65 0.68 -38 LR 569 0.56 0.71 -11 Data showed some non-linear patterns when GCC readings were above 80%, mostly for lucerne and legume-based pastures in Spring-Summer season. The MARS model was able to model this feature fitting the hinge function around GCC = 82.9 but without interactions detected during the tuning stage. These features were also captured in LR models by including quadratic and interaction terms, which increased performance of this approach compared to the base LR mode. The RF metrics showed a poor performance (Figure 1). From a practical perspective, Canopeo GCC can be easily converted to BM by applying the MARS function: Y = 1271 + 404 Z1+324 Z2+(-18.23) max(0,82.9-X)+ (88.35) max(0, X-82.9) where: X = GCC reading; Z1 = 1 for Summer-Spring, 0 otherwise; Z2 = 1 for Legume-based pasture; 0 otherwise. Figure 1. Measured vs modeled biomass (kg ha-1) by LR and MARS models. Conclusions The MARS approach was the most accurate in determining BM, but little differences were observed between models. The LR method was the most inaccurate and so there is potential to use other nonlinear more complex models to estimate pasture biomass using Canopeo. Including these models within the framework of the app could improve its usabilityand aid farmers with their feed budgeting.