CONICET | Buscador de Institutos y Recursos Humanos

Introduction. Over the years, analytical methods and data analysis tools commonly used in food quality and process control had to be re-evaluated and modified to fit these new tasks. In this progression of gathering more and better information, the multivariate statistical analysis of fused data has become a powerful tool for enhancing the reliability of the results1. Being the key point how the information sources can be combined to provide the joint classification prediction of the samples, three levels of data fusion (DF) have been reported2. The aim of this work was to develop multiple strategies to assess the three DF levels on two second- order arrays, with different data complexity, in order to know the correlation and analogy between both information sources for twofold classification purposes. Thus, the challenge consisted in finding the optimal combination of data preprocessing, fused data and data modeling that would provide the best results.Results and Discussion. The focus was put on the development of models able to distinguish among thirty-nine white wines of three different grape varieties with geographical indication (GI) from the four main wine production regions of Argentina [26]. For that, fluorescence excitation?emission matrix spectroscopy (EEM), and capillary electrophoresis with diode array detector (CE- DAD) were applied as non-target analysis in order to acquire a fingerprint to characterize the wines. Multi-levels data fusion strategies on three-way data were evaluated and compared revealing their advantages/disadvantages in the classification context. Straightforward approaches based on a series of data preprocessing and feature extraction steps were developed for each studied level. The data analysis workflow developed in this study is schematized in Fig. 1. In general terms, it includes: 1) building separate classification models on data obtained from the individual analytical techniques by applying 3 different approaches; and 2) building classification multiplatform models by applying different DF strategies: low-level DF, mid-level DF and high-level DF (assessing different approaches). Then, all the classification models obtained were assessed and compared. Partial least square discriminant analysis (PLS-DA) and its multi- way extension (NPLS-DA) were applied to CE- DAD, EEM and fused data matrices structured as two-way and three-way arrays, respectively. Classification results achieved on each model were evaluated through global indices such as average sensitivity non-error rate and average precision. Different degrees of improvement were observed comparing the fused matrix results with those obtained using a single one.Conclusions. The proposed multi-level fusion strategies constitute a significant improvement in the DF analysis, and it offers a wide range of possibilities when second-order data of different nature are assessed. In addition, they provide a useful and reliable way of improving the analytical quality of the results in second- order data for classification outcomes. The benefit of fusion is highlighted in prediction stage when samples cannot be classified from individual sources. Moreover, multi-level data fusion from multi-via modeling accomplished the best classification models.Thus, it is noteworthy that the benefits of data fusion at different levels are added to the second-order data advantage, furnishing a synergistic effect on the classification results.

enviar mensaje