Skip to contents

In Construction

Introduction

So, you (or your amazing lab mate) have finally finished the data acquisition, and now you have a dataset in hand. What’s next? Unfortunately, the work isn’t over yet. Before diving into any analysis, it’s crucial to understand the dataset itself. This is the first step in any data analysis workflow, ensuring that the data is of good quality and is well-prepared for preprocessing and any downstream analysis you plan to perform.

In this vignette, we present the dataset used throughout the different vignettes of this website. It’s far from a perfect dataset, which actually mirrors the reality of most datasets you’ll encounter in research.

Some issues will indeed be specific to this described dataset. However, the purpose of this vignette is to encourage you to think critically about your data and guide you through steps that can help you avoid spending hours on an analysis, only to realize later that some samples or features should have been removed or flagged earlier on.

Dataset Description

In this workflow, two datasets are used:

  1. An LC-MS-based (MS1 level only) untargeted metabolomics dataset to quantify small polar metabolites in human plasma samples.
  2. An additional LC-MS/MS dataset of selected samples from the former study for the identification and annotation of significant features.

The samples were randomly selected from a larger study aimed at identifying metabolites with varying abundances between individuals suffering from cardiovascular disease (CVD) and healthy controls (CTR). The subset analyzed here includes data for three CVD patients, three CTR individuals, and four quality control (QC) samples. The QC samples, representing a pooled serum sample from a large cohort, were measured repeatedly throughout the experiment to monitor signal stability.

The data and metadata for this workflow are available on the MetaboLights database under the ID: MTBLS8735.

The detailed materials and methods used for the sample analysis are also available in the MetaboLights entry. This is particularly important for understanding the analysis and the parameters used. It should be noted that the samples were analyzed using ultra-high-performance liquid chromatography (UHPLC) coupled to a Q-TOF mass spectrometer (TripleTOF 5600+), and chromatographic separation was achieved using hydrophilic interaction liquid chromatography (HILIC).

  • Consider moving visualizations from the end-to-end vignette to here for a clearer understanding of the dataset.
  • Provide more in-depth visualizations to explore and understand the dataset quality.
  • Compare pool lc-ms and pool lc-ms/ms and show that we have better separation on the second run.