IQUIR   05412
INSTITUTO DE QUIMICA ROSARIO
Unidad Ejecutora - UE
congresos y reuniones científicas
Título:
In silico isolation of natural products: obtaining 1H NMR spectra of pure compounds from spectra of mixtures that contain them.
Autor/es:
RAMALLO, IA; GATTI, PABLO; FURLAN, RLE; TAPIA, ELIZABETH; IBAÑEZ, GABRIELA; OLIVIERI, ALEJANDRO
Lugar:
Santiago de Chile
Reunión:
Congreso; ISCB Latin America 2012; 2012
Institución organizadora:
International Society for Computational Biology
Resumen:
Background Natural products are a valuable source of molecular diversity with high therapeutic potential .The discovery of biologically active natural products usually involves the fractionation of bioactive natural extracts using different separation techniques. In general, the analysis of the biological properties of the successive fractions guides the process toward the isolation of the bioactive compounds. Liquid chromatography (LC) is one of the most used separation methods used for the bioguided fractionation of plant extracts. The fractionation of biologically active natural extracts is an arduous task that does not always lead with the identification of a new compound. These extracts are complex mixtures that include high number of molecules, mainly uncharacterized, and present in a wide and variable range of concentrations. This complexity sometimes prevents the isolation of interesting molecules in a preparative scale in order to carry out their structural elucidation. Consequently, the structural elucidation of bioactive compounds present in mixtures or fractions becomes an attractive goal for any chemist. The possible strategy to find out structures from mixes is the combination of separation techniques, such as LC, with a spectroscopic detection method supplier of structural information about the current molecules, such as nuclear magnetic resonance (NMR). When mixtures are simple, the analysis of spectroscopic data to know which signals are from which structure frequently is done with the naked eye. But this work could be a real challenge when mixtures are getting more complex, in that case it is necessary the use of chemometric toolboxes to highlight and to take advantage of what is, otherwise, unnoticed. Previous studies apply multivariate methods extract pure NMR spectra from NMR experiments of hundreds of fractions of varying composition produced by LC [7-9]. These routines operate efficiently when the spectra are recorded with on-flow LC-NMR equipments, in which hundreds of spectra are obtained on each experiment. The requirement of such equipment limits their use in routine preparative chromatography wherein the number of fractions is significantly lower. Results Here, we present a novel routine, written in MATLAB 2007R, for the reconstruction of pure spectra for each component, from a small number (n) of 1H NMR experiments of mixtures of natural products. Quantitative information such as, number of present structures, pure spectra and concentration profiles are acquired without prior knowledge of the investigated system. The routine can be executed using a simple and intuitive graphical interface environment. This interface has been developed in order to facilitate the communication between the data and the user, who can take decisions on each stage of the pre-processing and resolution procedures in order to contribute his chemical knowledge of the samples. In the present report, the toolbox versatility and performance is evidenced through the analysis of various data sets, each consisting of real 1H NMR spectra of mixtures of natural products designed to simulate fractions of incomplete chromatographic separation (?artificial fractions?). Depending of the compounds included, three types of sets were used: Set_1a, Set_1b, Set_1c include mixtures of 3, 4, or 5 aromatic molecules respectively, Set_2a, Set_2b, Set_2c include mixtures of 3, 4, or 5 alcaloid structures respectively, and Set_3a, Set_3b, Set_3c include mixtures of 3, 4, or 5 flavonoid structures respectively. Each of the nine sets included 16 artificial fractions that contained different relative concentration of the corresponding components. The artificial fractions were prepared by mixing varying volumes of standard stock solutions (in deuterated chloroform for alkaloids and aromatics; in deuterated acetone for flavonoids) according with pre-designed composition profiles set up to simulate fractions of incomplete liquid chromatography separations. 1H NMR spectra were acquired at 300MHz on a Bruker model Avance II spectrometer (Bruker, Karlsruhe, Germany). The software runs under MatLab version 7.4 (or higher) by MathWorks®, without requiring any other third party?s utilities. The files only need to be copied into a folder declared in the MatLab Path. The employed data analysis strategy is based on the correlations between peaks on the spectrum. In 1H NMR, the intensity of the signals (peaks) of each proton of a molecule is proportional to their molar concentration. Therefore, the signals of the spectra belonging to the same molecule will covariate positively across the set of samples, and can be spotted by calculation of the correlation statistics between the intensities of all the signals observed in the whole series. The algorithm was developed to operate in two modules: pre-processing and resolution. From raw data, pre-processing is a fundamental first step to guarantee the success of the following resolution . In the preprocessing module we include all tools necessary for conditioning the data such as minimization of signal/noise ratio, correction of baseline distortions, and offsetting of peak shifts, normalization, elimination of peaks, and selection of portion of the chemical shift that the user want to conserve for the analysis. The resolution stage begins with an "intelligent fragmentation" which reduces the tens of thousands of points of each spectrum to hundreds of fragments (each fragment contains only one peak) whose areas form the "m" new variables. Using the eigenvalues of the correlation matrix of the array data "n x m" (classical Principal Component Analysis (PCA) or cross validated PCA) we estimate the number of chemical structures present in the mixture. Each new variable is characterized according to the loadings of the factor analysis model, and with classical and diffuse clustering algorithms we identify the groups of signals belonging to each structure. Then, we reconstruct the pure spectra and calculate the error of the estimation. Conclusions We have developed a new user-friendly graphical interface that is able to solve 1H NMR spectra of complex mixtures into pure spectra avoiding difficult and sometimes impossible completely physical separation of the component molecules. The software is oriented to help chemist-biologist with the labor-intensive and time-consuming task of analyzing spectral information from a few fractions from liquid chromatography profiles to elucidate the structure of biologically active molecules from natural sources on an early stage of purification. In addition, the developed pre-processing module is a practical tool for the optimization of 1 H NMR signals.