2.3 NMR data handling and (pre-)processing

NMR is another analytical technique commonly used in metabolomics research. The pre-processing steps for NMR data normally include Fourier transformation, apodization, zero filling, phase and baseline correction and finally referencing and alignment of spectra. Other steps commonly used are removing the areas without any metabolites such as the water region (from 4.7 to 4.9 ppm), as they generally contain no useful information. There are several R packages that can carry out the above tasks (see Table 4). The PepsNMR and speaq are two examples of such R-based packages. The 1D NMR spectra can then be segmented into spectral regions (also known as bins or buckets) subjected directly to statistical data analysis after a normalisation step. The size of the bins could be fixed or variable (adopted or intelligent binning) based on NMR peaks or even each data point from each peak (full data point resolution) used for data analysis. The NMRProcFlow [50] package provides a graphical and interactive interface for 1D NMR spectral processing and analysis. Additionally, it provides various spectral alignment methods with the ability to use the corresponding experimental-factor levels in a visual and interactive environment, bridging the gap between experimental design and subsequent statistical analyses. Alternatively peak picking (based on the regions of interest, ROI) can be performed and individual compounds can be identified and integrated prior to statistical analysis. Targeted profiling aims to identify and quantify specific compounds in a sample. The packages that use such approach (ROI) are rDolphin, and rNMR. The bucketed/integrated spectra are normalised to minimise the biological and technical variation. The most common methods are normalisation to a constant sum (e.g. total sum of integral/bin intensities), probabilistic quotient normalisation [51] and dry weight tissue or protein content.

NMR metabolite annotation uses either chemical shifts and multiplicity matching from existing database, such as Human Metabolome Database [52–55] (HMDB), literature experimental search or uses simulated reference library compounds [56] to match or to fit the existing biological spectra. 1D NMR data often is not sufficient for a confident assignment of the metabolite peaks [57] therefore complementary 2D spectral data acquisition are often required to confirm the assignment [58]. The only package that explicitly deals with 2D NMR is rNMR that takes a targeted approach where the user defines regions of interest to be quantified and compared. DOLPHIN, originally written in MATLAB [59], uses both 1D and 2D NMR data for targeted profiling that is also available as an R version called rDolphin. We are not aware of other R packages that handle 2D NMR data processing. Several general multiway statistical tools such as PARAFAC [60], Tucker3 [61] and MCR have been described [62] that are able to analyse 1D and 2D NMR data, see the section on statistical analysis for a list of packages available for these techniques. BATMAN uses a Bayesian model and some template information such as chemical shifts, J-couplings, multiplicity and intensity ratios derived from spectral database to automatically quantify metabolites in a targeted manner [63].