INVESTIGADORES
MONGE Maria Eugenia
congresos y reuniones científicas
Título:
A pipeline for pre-processing and assessing data quality in a Clear Cell Renal Cell Carcinoma (ccRCC) case study
Autor/es:
NICOLÁS ZABALEGUI; GABRIEL RIQUELME; MALENA MANZI; MARÍA EUGENIA MONGE
Lugar:
Conferencia Virtual
Reunión:
Conferencia; 17th Annual Conference of the Metabolomics Society Metabolomics2020 Online; 2021
Institución organizadora:
The Metabolomics Society
Resumen:
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics studies focused on complex biological samples analysis may lead to the detection of thousands of features (retention time, m/z pairs) at the initial stages of the data pre-processing workflow. However, raw data need to be pre-processed in a reproducible way to remove biologically non-relevant features and thereafter obtain cleaned and robust matrices suitable for subsequent data analysis.Kidney cancer is accepted to be a metabolic disease. More than 50% of cases of renal cell carcinoma (RCC) are incidentally diagnosed; being clear cell RCC (ccRCC) the most common histological subtype. Since the disease is inherently resistant to chemotherapy and radiotherapy, and considering that surgery is the most efficient treatment for curation exclusively when the disease is detected at earlier stages, the discovery of early detection biomarkers is the most promising approach to reduce RCC mortality.In this study, serum samples from a cohort (n=258) that included patients with ccRCC (stages I, II, III, IV), and controls were interrogated with a discovery-based metabolomics approach using UHPLC-QTOF-MS. LC-MS data were initially pre-processed with TidyMS, a versatile Python package used for data curation in untargeted metabolomics workflows that has been recently developed in our research group. Pooled QC sample extracts from the study cohort were re-analyzed by UHPLC-QTOF-MS after 39 months to evaluate sample stability over time. Therefore, a feature matrix matching process was applied to retain only stable features. Finally, only those metabolic features that were highly correlated with serial dilutions of intrastudy QC samples were retained to account for non-linear instrumental responses.These practices were conducted before performing statistical multivariate analysis of data collected in an ongoing ccRCC biomarker discovery case study to obtain potentially identifiable discriminant features, to improve the confidence in data analysis, and to achieve a more accurate biological interpretation of results.