vignettes/MsBackendWeizMass.Rmd
MsBackendWeizMass.Rmd
Package: MsBackendWeizMass
Authors: Johannes Rainer [cre] (https://orcid.org/0000-0002-6977-7147), Nir Shachaf
[ctb]
Compiled: Thu Aug 4 12:02:44 2022
The Spectra
package provides a central infrastructure
for the handling of Mass Spectrometry (MS) data. The package supports
interchangeable use of different backends to import MS data
from a variety of sources (such as mzML files). The
MsBackendWeizMass
package allows import and handling MS/MS
spectrum data from WeizMass spectral library databases (Shahaf et al. 2016). This enables integration
of the high quality WeizMass MS/MS spectral library into
Spectra
-based annotation workflows (Rainer et al. 2022).
The package can be installed with the BiocManager
package. To install BiocManager
use
install.packages("BiocManager")
and, after that,
BiocManager::install("RforMassSpectrometry/MsBackendWeizMass")
to install this package.
To use the MsBackendWeizMass
package, access to a
WeizMass database (i.e. a WeizMass MySQL database) is required.
Connection information needs to be requested from the original authors
from the WeizMass database (Shahaf et al.
2016). In this section we use a tiny SQLite-based test database
which is included within this package that has the same database layout
then the WeizMass v2 database.
Below we load all required libraries and get the file name of the SQLite database.
library(Spectra)
library(MsBackendWeizMass)
library(RSQLite)
db <- system.file("sqlite", "weizmassv2.sqlite", package = "MsBackendWeizMass")
con <- dbConnect(SQLite(), db)
A Spectra
object representing the data from the WeizMass
database can be created with the Spectra
function providing
the connection to the database as well as specifying the backend to be
used (MsBackendWeizMass
).
sps <- Spectra(con, source = MsBackendWeizMass())
sps
## MSn data (Spectra) with 2 spectra in a MsBackendWeizMass backend:
## msLevel precursorMz polarity
## <integer> <numeric> <integer>
## 1 NA 595.166 1
## 2 NA 593.150 0
## ... 45 more variables/columns.
## Use 'spectraVariables' to list all of them.
The spectraVariables
function can be used to get all
available spectra variables from the database.
spectraVariables(sps)
## [1] "msLevel" "rtime"
## [3] "acquisitionNum" "scanIndex"
## [5] "dataStorage" "dataOrigin"
## [7] "centroided" "smoothed"
## [9] "polarity" "precScanNum"
## [11] "precursorMz" "precursorIntensity"
## [13] "precursorCharge" "collisionEnergy"
## [15] "isolationWindowLowerMz" "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz" "precursor_mz_text"
## [19] "spectrumId" "compound_id"
## [21] "ION" "adduct"
## [23] "EXTRA_IONS" "EXTRA_MZ"
## [25] "rtime_ci" "UV"
## [27] "CCS" "DATE"
## [29] "formula" "exactmass"
## [31] "SOURCE" "LIBRARY"
## [33] "smiles" "inchikey"
## [35] "CHEMICAL_CLASS" "CURATED_CHEMICAL_CLASS"
## [37] "ORGANISM_TYPE" "CHEM_LOCATION"
## [39] "instrument" "CHROMATOGRAPHY"
## [41] "ISOMER_OF" "MSI"
## [43] "common_name" "iupac_name"
## [45] "relative_intensity" "peak_annotation"
Individual spectra variables can be accessed using a dedicated
function (such as rtime
, msLevel
, etc), if
available, or using the $
operator. The chemical formulas
for the compounds of the spectra could for example be retrieved using
$formula
:
sps$formula
## [1] "C27H30O15" "C27H30O15"
In addition it is possible to retrieve multiple spectra variables
using the spectraData
function:
spectraData(sps, c("rtime", "formula", "adduct"))
## DataFrame with 2 rows and 3 columns
## rtime formula adduct
## <numeric> <character> <character>
## 1 7.06 C27H30O15 [M]+
## 2 7.09 C27H30O15 [M-H]-
MS/MS peak data can be retrieved using the peaksData
function which returns a matrix
with the values. Below we
get thus the MS peaks for the first spectrum.
peaksData(sps)[[1L]]
## mz intensity
## [1,] 325.0707 119
## [2,] 337.0707 60
## [3,] 355.0812 75
## [4,] 379.0812 134
## [5,] 380.0891 35
## [6,] 391.0812 59
## [7,] 403.0812 63
## [8,] 409.0918 130
## [9,] 421.0918 81
## [10,] 427.1024 130
## [11,] 428.1102 39
## [12,] 439.1024 104
## [13,] 457.1129 391
## [14,] 458.1207 104
## [15,] 475.1235 115
## [16,] 476.1313 35
## [17,] 481.1129 122
## [18,] 482.1207 34
## [19,] 499.1235 88
## [20,] 505.1129 35
## [21,] 511.1235 113
## [22,] 523.1235 102
## [23,] 529.1341 78
## [24,] 541.1341 126
## [25,] 542.1419 39
## [26,] 559.1446 169
## [27,] 560.1524 62
## [28,] 577.1552 364
## [29,] 578.1630 126
## [30,] 579.1646 35
The WeizMass database provides also additional peak information that
can be listed using the peaksVariables
function:
peaksVariables(sps)
## [1] "mz" "intensity" "relative_intensity"
## [4] "peak_annotation"
We could thus also query the annotations for the individual peaks by
requesting in addition the column "peak_annotation"
in the
peaksData
call.
## mz intensity peak_annotation
## [1,] "325.0707" "119" "C18H12O6"
## [2,] "337.0707" " 60" "C19H12O6"
## [3,] "355.0812" " 75" "C19H14O7"
## [4,] "379.0812" "134" "C21H14O7"
## [5,] "380.0891" " 35" "C21H15O7"
## [6,] "391.0812" " 59" "C22H14O7"
## [7,] "403.0812" " 63" "C23H14O7"
## [8,] "409.0918" "130" "C22H16O8"
## [9,] "421.0918" " 81" "C23H16O8"
## [10,] "427.1024" "130" "C22H18O9"
## [11,] "428.1102" " 39" "C22H19O9"
## [12,] "439.1024" "104" "C23H18O9"
## [13,] "457.1129" "391" "C23H20O10"
## [14,] "458.1207" "104" "C23H21O10"
## [15,] "475.1235" "115" "C23H22O11"
## [16,] "476.1313" " 35" "C23H23O11"
## [17,] "481.1129" "122" "C25H20O10"
## [18,] "482.1207" " 34" "C25H21O10"
## [19,] "499.1235" " 88" "C25H22O11"
## [20,] "505.1129" " 35" "C27H20O10"
## [21,] "511.1235" "113" "C26H22O11"
## [22,] "523.1235" "102" "C27H22O11"
## [23,] "529.1341" " 78" "C26H24O12"
## [24,] "541.1341" "126" "C27H24O12"
## [25,] "542.1419" " 39" "C27H25O12"
## [26,] "559.1446" "169" "C27H26O13"
## [27,] "560.1524" " 62" "C27H27O13"
## [28,] "577.1552" "364" "C27H28O14"
## [29,] "578.1630" "126" "C27H29O14"
## [30,] "579.1646" " 35" ""
Note however that, since peaksData
always returns a matrix
, also the m/z and
intensity values are reported as characters instead of numeric values.
Thus, it is advisable to query m/z and intensity values separately from
peak annotations.
Peak annotations can alternatively also directly extracted from a
Spectra
object using the $
operator.
sps$peak_annotation
## CharacterList of length 2
## [[1]] C18H12O6 C19H12O6 C19H14O7 C21H14O7 ... C27H27O13 C27H28O14 C27H29O14
## [[2]] C17H14O5 C17H12O6 C18H14O6 C18H15O6 ... C25H26O13 C26H26O13 C27H28O14
Note also that precursor m/z values are stored as character values in
the database, but converted to numeric by the backend during the data
retrieval. For stored values that can not be converted to a numeric an
NA
is thus reported.
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RSQLite_2.2.15 MsBackendWeizMass_0.1.1 Spectra_1.7.1
## [4] ProtGenerics_1.27.2 BiocParallel_1.30.3 S4Vectors_0.34.0
## [7] BiocGenerics_0.42.0 BiocStyle_2.24.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 bslib_0.4.0 compiler_4.2.0
## [4] BiocManager_1.30.18 jquerylib_0.1.4 tools_4.2.0
## [7] bit_4.0.4 digest_0.6.29 clue_0.3-61
## [10] jsonlite_1.8.0 evaluate_0.15 memoise_2.0.1
## [13] pkgconfig_2.0.3 rlang_1.0.4 DBI_1.1.3
## [16] cli_3.3.0 yaml_2.3.5 parallel_4.2.0
## [19] pkgdown_2.0.6.9000 xfun_0.31 fastmap_1.1.0
## [22] cluster_2.1.3 stringr_1.4.0 knitr_1.39
## [25] vctrs_0.4.1 desc_1.4.1 fs_1.5.2
## [28] sass_0.4.2 systemfonts_1.0.4 IRanges_2.30.0
## [31] MsCoreUtils_1.8.0 bit64_4.0.5 rprojroot_2.0.3
## [34] R6_2.5.1 textshaping_0.3.6 rmarkdown_2.14
## [37] bookdown_0.27 blob_1.2.3 purrr_0.3.4
## [40] magrittr_2.0.3 codetools_0.2-18 htmltools_0.5.3
## [43] MASS_7.3-58 ragg_1.2.2 stringi_1.7.8
## [46] cachem_1.0.6