Package: MsBackendHmdb
Authors: Laurent Gatto [aut] (https://orcid.org/0000-0002-1520-2268), Johannes Rainer [aut, cre] (https://orcid.org/0000-0002-6977-7147), Sebastian Gibb [aut] (https://orcid.org/0000-0001-7406-4443)
Last modified: 2020-09-16 12:49:58
Compiled: Wed Sep 16 12:53:10 2020
The Spectra
package provides a central infrastructure for the handling of Mass Spectrometry (MS) data. The package supports interchangeable use of different backends to import MS data from a variety of sources (such as mzML files). The MsBackendHmdb
package enables, with the MsBackendHmdbXml
object, import of MS/MS spectrum data from xml files from The Human Metabolome Database HMDB. This vignette illustrates the usage of the MsBackendHmdb
package to enable HMDB data usage in Spectra
.
Spectral data from HMDB can be downloaded in xml format, one xml file per spectrum. In our short example we load 4 such xml files which are provided with this package. Below we first load all required packages and define the file names of the MS/MS spectra xml files.
library(Spectra) library(MsBackendHmdb) fls <- dir(system.file("xml", package = "MsBackendHmdb"), full.names = TRUE, pattern = "xml$")
MS data can be accessed and analyzed through Spectra
objects. Below we create a Spectra
with the data from the above xml files. To this end we provide the file names and specify to use a MsBackendHmdbXml()
backend as source to enable data import and MsBackendDataFrame()
as backend to store/handle the data.
sps <- Spectra(fls, source = MsBackendHmdbXml(), backend = MsBackendDataFrame())
## Start data import from 4 files ... done
With that we have now full access to all imported spectra variables that we list below.
spectraVariables(sps)
## [1] "msLevel" "rtime"
## [3] "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
## [7] "centroided" "smoothed"
## [9] "polarity" "precScanNum"
## [11] "precursorMz" "precursorIntensity"
## [13] "precursorCharge" "collisionEnergy"
## [15] "isolationWindowLowerMz" "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz" "spectrum_id"
## [19] "compound_id" "predicted"
## [21] "splash" "instrument_type"
Besides default spectra variables, such as msLevel
, rtime
, precursorMz
(most of which are however not defined in the xml files from HMDB) we have also additional spectra variables such as the spectrum_id
(the ID of the spectrum in the HMDB database), compound_id
(the metabolite identifier), splash
or instrument_type
. Below we list the instrument type for the 4 spectra.
sps$instrument_type
## [1] "Quattro_QQQ" "Quattro_QQQ" "LC-ESI-QQ" NA
The last spectrum was predicted and the instrument type is thus set to NA
.
In addition we can also access the m/z and intensity values of each spectrum.
mz(sps)
## NumericList of length 4
## [[1]] 109.2 124.2 124.5 170.16 170.52
## [[2]] 83.1 96.12 97.14 109.14 124.08 125.1 170.16
## [[3]] 44.1 57.9 61.4 71.2 73.8 78.3 78.8 ... 142.9 144.1 157.6 158 175.2 193.2
## [[4]] 111.0815386 249.2587746 273.2587746 ... 367.3006394 383.3319396
intensity(sps)
## NumericList of length 4
## [[1]] 3.4069997949 47.4945730526 3.0943658252 100 13.2396931601
## [[2]] 6.6850282585 4.3812986792 3.02214394 16.7082567782 100 4.5651408768 40.6434125315
## [[3]] 0.051124 0.006597 0.012094 0.001649 ... 0.035732 0.384806 100 0.008796
## [[4]] 0.5509733486 0.7330055955 0.5171464311 ... 2.435447577 52.13575541
It is also possible to import all MS/MS spectra from HMDB into a Spectra
object. For this we have to firstly download all xml files from the downloads page of HMDB. HMDB allows to download all files in a single archive, unzipping that will result in a folder with a very large number of small files, one file per spectrum. Note also that this folder will contain also NMR and other types of spectra. The variable path
below is supposed to point to the folder where all xml files can be found. We first list all xml files in that folder using the pattern "ms_ms_spectrum"
to get only file names for MS/MS spectra.
path <- "~/data/hmdb_all_spectra/" fls <- dir(path, pattern = "ms_ms_spectrum", full.names = TRUE)
With that we can now import the data and create a Spectra
object representing the collection of all HMDB MS2 spectra. Setting nonStop = TRUE
prevents the call to stop whenever it encounters problematic xml files (like xml files without peaks). Note that the import of about 400,000 MS/MS spectra can take a long time (in the range of one to several hours).
sps_hmdb <- Spectra(fls, source = MsBackendHmdbXml(), nonStop = TRUE, backend = MsBackendDataFrame())
## R version 4.0.2 Patched (2020-09-10 r79182)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-openmp/libopenblasp-r0.3.8.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] MsBackendHmdb_0.2.0 Spectra_0.99.4 ProtGenerics_1.21.0
## [4] BiocParallel_1.23.2 S4Vectors_0.27.12 BiocGenerics_0.35.4
## [7] BiocStyle_2.17.0
##
## loaded via a namespace (and not attached):
## [1] cpp11_0.2.1 xml2_1.3.2 knitr_1.29
## [4] magrittr_1.5 MASS_7.3-53 MsCoreUtils_1.1.5
## [7] IRanges_2.23.10 R6_2.4.1 ragg_0.3.1
## [10] rlang_0.4.7 stringr_1.4.0 tools_4.0.2
## [13] xfun_0.17 htmltools_0.5.0 systemfonts_0.3.1
## [16] yaml_2.2.1 assertthat_0.2.1 rprojroot_1.3-2
## [19] digest_0.6.25 pkgdown_1.6.1.9000 crayon_1.3.4
## [22] bookdown_0.20 BiocManager_1.30.10 fs_1.5.0
## [25] memoise_1.1.0 evaluate_0.14 rmarkdown_2.3
## [28] stringi_1.5.3 compiler_4.0.2 desc_1.2.0
## [31] backports_1.1.9