Description and usage of MsBackendWeizMass • MsBackendWeizMass

Package: MsBackendWeizMass
Authors: Johannes Rainer [cre] (https://orcid.org/0000-0002-6977-7147), Nir Shachaf [ctb]
Compiled: Thu Aug 4 12:02:44 2022

Introduction

The Spectra package provides a central infrastructure for the handling of Mass Spectrometry (MS) data. The package supports interchangeable use of different backends to import MS data from a variety of sources (such as mzML files). The MsBackendWeizMass package allows import and handling MS/MS spectrum data from WeizMass spectral library databases (Shahaf et al. 2016). This enables integration of the high quality WeizMass MS/MS spectral library into Spectra-based annotation workflows (Rainer et al. 2022).

Installation

The package can be installed with the BiocManager package. To install BiocManager use install.packages("BiocManager") and, after that, BiocManager::install("RforMassSpectrometry/MsBackendWeizMass") to install this package.

Accessing MS/MS data from a WeizMass database

To use the MsBackendWeizMass package, access to a WeizMass database (i.e. a WeizMass MySQL database) is required. Connection information needs to be requested from the original authors from the WeizMass database (Shahaf et al. 2016). In this section we use a tiny SQLite-based test database which is included within this package that has the same database layout then the WeizMass v2 database.

Below we load all required libraries and get the file name of the SQLite database.

library(Spectra)
library(MsBackendWeizMass)
library(RSQLite)
db <- system.file("sqlite", "weizmassv2.sqlite", package = "MsBackendWeizMass")
con <- dbConnect(SQLite(), db)

A Spectra object representing the data from the WeizMass database can be created with the Spectra function providing the connection to the database as well as specifying the backend to be used (MsBackendWeizMass).

sps <- Spectra(con, source = MsBackendWeizMass())
sps

## MSn data (Spectra) with 2 spectra in a MsBackendWeizMass backend:
##     msLevel precursorMz  polarity
##   <integer>   <numeric> <integer>
## 1        NA     595.166         1
## 2        NA     593.150         0
##  ... 45 more variables/columns.
##  Use  'spectraVariables' to list all of them.

The spectraVariables function can be used to get all available spectra variables from the database.

spectraVariables(sps)

##  [1] "msLevel"                 "rtime"                  
##  [3] "acquisitionNum"          "scanIndex"              
##  [5] "dataStorage"             "dataOrigin"             
##  [7] "centroided"              "smoothed"               
##  [9] "polarity"                "precScanNum"            
## [11] "precursorMz"             "precursorIntensity"     
## [13] "precursorCharge"         "collisionEnergy"        
## [15] "isolationWindowLowerMz"  "isolationWindowTargetMz"
## [17] "isolationWindowUpperMz"  "precursor_mz_text"      
## [19] "spectrumId"              "compound_id"            
## [21] "ION"                     "adduct"                 
## [23] "EXTRA_IONS"              "EXTRA_MZ"               
## [25] "rtime_ci"                "UV"                     
## [27] "CCS"                     "DATE"                   
## [29] "formula"                 "exactmass"              
## [31] "SOURCE"                  "LIBRARY"                
## [33] "smiles"                  "inchikey"               
## [35] "CHEMICAL_CLASS"          "CURATED_CHEMICAL_CLASS" 
## [37] "ORGANISM_TYPE"           "CHEM_LOCATION"          
## [39] "instrument"              "CHROMATOGRAPHY"         
## [41] "ISOMER_OF"               "MSI"                    
## [43] "common_name"             "iupac_name"             
## [45] "relative_intensity"      "peak_annotation"

Individual spectra variables can be accessed using a dedicated function (such as rtime, msLevel, etc), if available, or using the $ operator. The chemical formulas for the compounds of the spectra could for example be retrieved using $formula:

sps$formula

## [1] "C27H30O15" "C27H30O15"

In addition it is possible to retrieve multiple spectra variables using the spectraData function:

spectraData(sps, c("rtime", "formula", "adduct"))

## DataFrame with 2 rows and 3 columns
##       rtime     formula      adduct
##   <numeric> <character> <character>
## 1      7.06   C27H30O15        [M]+
## 2      7.09   C27H30O15      [M-H]-

MS/MS peak data can be retrieved using the peaksData function which returns a matrix with the values. Below we get thus the MS peaks for the first spectrum.

peaksData(sps)[[1L]]

##             mz intensity
##  [1,] 325.0707       119
##  [2,] 337.0707        60
##  [3,] 355.0812        75
##  [4,] 379.0812       134
##  [5,] 380.0891        35
##  [6,] 391.0812        59
##  [7,] 403.0812        63
##  [8,] 409.0918       130
##  [9,] 421.0918        81
## [10,] 427.1024       130
## [11,] 428.1102        39
## [12,] 439.1024       104
## [13,] 457.1129       391
## [14,] 458.1207       104
## [15,] 475.1235       115
## [16,] 476.1313        35
## [17,] 481.1129       122
## [18,] 482.1207        34
## [19,] 499.1235        88
## [20,] 505.1129        35
## [21,] 511.1235       113
## [22,] 523.1235       102
## [23,] 529.1341        78
## [24,] 541.1341       126
## [25,] 542.1419        39
## [26,] 559.1446       169
## [27,] 560.1524        62
## [28,] 577.1552       364
## [29,] 578.1630       126
## [30,] 579.1646        35

The WeizMass database provides also additional peak information that can be listed using the peaksVariables function:

peaksVariables(sps)

## [1] "mz"                 "intensity"          "relative_intensity"
## [4] "peak_annotation"

We could thus also query the annotations for the individual peaks by requesting in addition the column "peak_annotation" in the peaksData call.

peaksData(sps, c("mz", "intensity", "peak_annotation"))[[1L]]

##       mz         intensity peak_annotation
##  [1,] "325.0707" "119"     "C18H12O6"     
##  [2,] "337.0707" " 60"     "C19H12O6"     
##  [3,] "355.0812" " 75"     "C19H14O7"     
##  [4,] "379.0812" "134"     "C21H14O7"     
##  [5,] "380.0891" " 35"     "C21H15O7"     
##  [6,] "391.0812" " 59"     "C22H14O7"     
##  [7,] "403.0812" " 63"     "C23H14O7"     
##  [8,] "409.0918" "130"     "C22H16O8"     
##  [9,] "421.0918" " 81"     "C23H16O8"     
## [10,] "427.1024" "130"     "C22H18O9"     
## [11,] "428.1102" " 39"     "C22H19O9"     
## [12,] "439.1024" "104"     "C23H18O9"     
## [13,] "457.1129" "391"     "C23H20O10"    
## [14,] "458.1207" "104"     "C23H21O10"    
## [15,] "475.1235" "115"     "C23H22O11"    
## [16,] "476.1313" " 35"     "C23H23O11"    
## [17,] "481.1129" "122"     "C25H20O10"    
## [18,] "482.1207" " 34"     "C25H21O10"    
## [19,] "499.1235" " 88"     "C25H22O11"    
## [20,] "505.1129" " 35"     "C27H20O10"    
## [21,] "511.1235" "113"     "C26H22O11"    
## [22,] "523.1235" "102"     "C27H22O11"    
## [23,] "529.1341" " 78"     "C26H24O12"    
## [24,] "541.1341" "126"     "C27H24O12"    
## [25,] "542.1419" " 39"     "C27H25O12"    
## [26,] "559.1446" "169"     "C27H26O13"    
## [27,] "560.1524" " 62"     "C27H27O13"    
## [28,] "577.1552" "364"     "C27H28O14"    
## [29,] "578.1630" "126"     "C27H29O14"    
## [30,] "579.1646" " 35"     ""

Note however that, since peaksData always returns a matrix, also the m/z and intensity values are reported as characters instead of numeric values. Thus, it is advisable to query m/z and intensity values separately from peak annotations.

Peak annotations can alternatively also directly extracted from a Spectra object using the $ operator.

sps$peak_annotation

## CharacterList of length 2
## [[1]] C18H12O6 C19H12O6 C19H14O7 C21H14O7 ... C27H27O13 C27H28O14 C27H29O14 
## [[2]] C17H14O5 C17H12O6 C18H14O6 C18H15O6 ... C25H26O13 C26H26O13 C27H28O14

Note also that precursor m/z values are stored as character values in the database, but converted to numeric by the backend during the data retrieval. For stored values that can not be converted to a numeric an NA is thus reported.

Session information

sessionInfo()

## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] RSQLite_2.2.15          MsBackendWeizMass_0.1.1 Spectra_1.7.1          
## [4] ProtGenerics_1.27.2     BiocParallel_1.30.3     S4Vectors_0.34.0       
## [7] BiocGenerics_0.42.0     BiocStyle_2.24.0       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9          bslib_0.4.0         compiler_4.2.0     
##  [4] BiocManager_1.30.18 jquerylib_0.1.4     tools_4.2.0        
##  [7] bit_4.0.4           digest_0.6.29       clue_0.3-61        
## [10] jsonlite_1.8.0      evaluate_0.15       memoise_2.0.1      
## [13] pkgconfig_2.0.3     rlang_1.0.4         DBI_1.1.3          
## [16] cli_3.3.0           yaml_2.3.5          parallel_4.2.0     
## [19] pkgdown_2.0.6.9000  xfun_0.31           fastmap_1.1.0      
## [22] cluster_2.1.3       stringr_1.4.0       knitr_1.39         
## [25] vctrs_0.4.1         desc_1.4.1          fs_1.5.2           
## [28] sass_0.4.2          systemfonts_1.0.4   IRanges_2.30.0     
## [31] MsCoreUtils_1.8.0   bit64_4.0.5         rprojroot_2.0.3    
## [34] R6_2.5.1            textshaping_0.3.6   rmarkdown_2.14     
## [37] bookdown_0.27       blob_1.2.3          purrr_0.3.4        
## [40] magrittr_2.0.3      codetools_0.2-18    htmltools_0.5.3    
## [43] MASS_7.3-58         ragg_1.2.2          stringi_1.7.8      
## [46] cachem_1.0.6

Rainer, Johannes, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, et al. 2022. “A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R.” Metabolites 12 (2): 173. https://doi.org/10.3390/metabo12020173.

Shahaf, Nir, Ilana Rogachev, Uwe Heinig, Sagit Meir, Sergey Malitsky, Maor Battat, Hilary Wyner, Shuning Zheng, Ron Wehrens, and Asaph Aharoni. 2016. “The WEIZMASS Spectral Library for High-Confidence Metabolite Identification.” Nature Communications 7 (1): 12423. https://doi.org/10.1038/ncomms12423.