The MsBackendWeizMass provides access to WeizMass mass spectrometry libraries by directly accessing its MySQL/MariaDb database. In addition the backend supports adding new spectra variables to the object, or to locally change spectra variables (without changing the original values in the database).

Note that MsBackendWeizMass requires access to a WeizMass MySQL/MariaDB database.

Also, some of the fields in the WeizMass database are not directly compatible with Spectra, as the data is stored as text instead of numeric. The precursor m/z values are for example stored as character in the database, but are converted to numeric during the data access. Thus, for spectra with non-numeric values stored in that field an NA is reported.

MsBackendWeizMass()

# S4 method for MsBackendWeizMass
backendInitialize(object, dbcon, ...)

# S4 method for MsBackendWeizMass
peaksVariables(object)

# S4 method for MsBackendWeizMass
peaksData(object, columns = c("mz", "intensity"))

# S4 method for MsBackendWeizMass
dataStorage(object)

# S4 method for MsBackendWeizMass
intensity(object) <- value

# S4 method for MsBackendWeizMass
mz(object) <- value

# S4 method for MsBackendWeizMass
reset(object)

# S4 method for MsBackendWeizMass
spectraData(object, columns = spectraVariables(object))

# S4 method for MsBackendWeizMass
spectraNames(object)

# S4 method for MsBackendWeizMass
spectraNames(object) <- value

# S4 method for MsBackendWeizMass
tic(object, initial = TRUE)

# S4 method for MsBackendWeizMass
[(x, i, j, ..., drop = FALSE)

# S4 method for MsBackendWeizMass
$(x, name) <- value

# S4 method for MsBackendWeizMass
precScanNum(object)

Arguments

object

Object extending MsBackendWeizMass.

dbcon

For backendInitialize,MsBackendWeizMass: SQL database connection to the WeizMass database.

...

Additional arguments.

columns

For spectraData accessor: optional character with column names (spectra variables) that should be included in the returned DataFrame. By default, all columns are returned.

value

replacement value for <- methods. See individual method description or expected data type.

initial

For tic: logical(1) whether the initially reported total ion current should be reported, or whether the total ion current should be (re)calculated on the actual data (initial = FALSE).

x

Object extending MsBackendWeizMass.

i

For [: integer, logical or character to subset the object.

j

For [: not supported.

drop

For [: not considered.

name

name of the variable to replace for <- methods. See individual method description or expected data type.

spectraVariables

For selectSpectraVariables: character with the names of the spectra variables to which the backend should be subsetted.

Value

See documentation of respective function.

Supported Backend functions

The following functions are supported by the MsBackendWeizMass.

  • [: subset the backend. Only subsetting by element (row/i) is allowed

  • $, $<-: access or set/add a single spectrum variable (column) in the backend.

  • acquisitionNum: returns the acquisition number of each spectrum. Returns an integer of length equal to the number of spectra (with NA_integer_ if not available).

  • backendInitialize: initialises the backend by retrieving the IDs of all spectra in the database. Parameter dbcon with the connection to the WeizMass MySQL database is required.

  • dataOrigin: gets a character of length equal to the number of spectra in object with the data origin of each spectrum. This could e.g. be the mzML file from which the data was read.

  • dataStorage: returns "<WeizMass>" for all spectra.

  • centroided, centroided<-: gets or sets the centroiding information of the spectra. centroided returns a logical vector of length equal to the number of spectra with TRUE if a spectrum is centroided, FALSE if it is in profile mode and NA if it is undefined. See also isCentroided for estimating from the spectrum data whether the spectrum is centroided. value for centroided<- is either a single logical or a logical of length equal to the number of spectra in object.

  • collisionEnergy, collisionEnergy<-: gets or sets the collision energy for all spectra in object. collisionEnergy returns a numeric with length equal to the number of spectra (NA_real_ if not present/defined), collisionEnergy<- takes a numeric of length equal to the number of spectra in object. Note that the collision energy description from WeizMass are provided as spectra variable "collisionEnergyText".

  • intensity: gets the intensity values from the spectra. Returns a NumericList() of numeric vectors (intensity values for each spectrum). The length of the list is equal to the number of spectra in object.

  • ionCount: returns a numeric with the sum of intensities for each spectrum. If the spectrum is empty (see isEmpty), NA_real_ is returned.

  • isCentroided: a heuristic approach assessing if the spectra in object are in profile or centroided mode. The function takes the qtl th quantile top peaks, then calculates the difference between adjacent m/z value and returns TRUE if the first quartile is greater than k. (See Spectra:::.isCentroided for the code.)

  • isEmpty: checks whether a spectrum in object is empty (i.e. does not contain any peaks). Returns a logical vector of length equal number of spectra.

  • isolationWindowLowerMz, isolationWindowLowerMz<-: gets or sets the lower m/z boundary of the isolation window.

  • isolationWindowTargetMz, isolationWindowTargetMz<-: gets or sets the target m/z of the isolation window.

  • isolationWindowUpperMz, isolationWindowUpperMz<-: gets or sets the upper m/z boundary of the isolation window.

  • isReadOnly: returns a logical(1) whether the backend is read only or does allow also to write/update data.

  • length: returns the number of spectra in the object.

  • lengths: gets the number of peaks (m/z-intensity values) per spectrum. Returns an integer vector (length equal to the number of spectra). For empty spectra, 0 is returned.

  • msLevel: gets the spectra's MS level. Returns an integer vector (of length equal to the number of spectra) with the MS level for each spectrum (or NA_integer_ if not available).

  • mz: gets the mass-to-charge ratios (m/z) from the spectra. Returns a NumericList() or length equal to the number of spectra, each element a numeric vector with the m/z values of one spectrum.

  • peaksData returns a list with the spectras' peak data. The length of the list is equal to the number of spectra in object. Each element of the list is a matrix with columns defined by parameter columns which defaults to columns = c("mz", "intensity") but any of peaksVariables(object) would be supported. Note that if columns contains "peak_annotation", the whole matrix will be of type character (i.e. even the m/z and intensity values will be provided as text). See examples below for details. For an empty spectrum, a matrix with 0 rows is returned.

  • peaksVariables returns a character with the provided peaks variables (i.e. data available for each individual mass peak). These can be used in peaksData to retrieve the specified values.

  • polarity, polarity<-: gets or sets the polarity for each spectrum. polarity returns an integer vector (length equal to the number of spectra), with 0 and 1 representing negative and positive polarities, respectively. polarity<- expects an integer vector of length 1 or equal to the number of spectra.

  • precursorCharge, precursorIntensity, precursorMz, precScanNum, precAcquisitionNum: get the charge (integer), intensity (numeric), m/z (numeric), scan index (integer) and acquisition number (interger) of the precursor for MS level 2 and above spectra from the object. Returns a vector of length equal to the number of spectra in object. NA are reported for MS1 spectra of if no precursor information is available.

  • reset: restores the backend to its original state, i.e. deletes all locally modified data and reinitializes the backend to the full data available in the database.

  • rtime, rtime<-: gets or sets the retention times for each spectrum (in seconds). rtime returns a numeric vector (length equal to the number of spectra) with the retention time for each spectrum. rtime<- expects a numeric vector with length equal to the number of spectra.

  • scanIndex: returns an integer vector with the scan index for each spectrum. This represents the relative index of the spectrum within each file. Note that this can be different to the acquisitionNum of the spectrum which is the index of the spectrum as reported in the mzML file.

  • selectSpectraVariables: reduces the information within the backend to the selected spectra variables.

  • smoothed,smoothed<-: gets or sets whether a spectrum is smoothed. smoothed returns a logical vector of length equal to the number of spectra. smoothed<- takes a logical vector of length 1 or equal to the number of spectra in object.

  • spectraData: gets general spectrum metadata (annotation, also called header). spectraData returns a DataFrame. Note that replacing the spectra data with spectraData<- is not supported.

  • spectraNames: returns a character vector with the names of the spectra in object.

  • spectraVariables: returns a character vector with the available spectra variables (columns, fields or attributes) available in object. This should return all spectra variables which are present in object, also "mz" and "intensity" (which are by default not returned by the spectraVariables,Spectra method).

  • tic: gets the total ion current/count (sum of signal of a spectrum) for all spectra in object. By default, the value reported in the original raw data file is returned. For an empty spectrum, NA_real_ is returned.

Not supported Backend functions

The following functions are not supported by the MsBackendWeizMass since the original data can not be changed.

backendMerge, export, filterDataStorage, filterPrecursorScan, peaksData<-, filterAcquisitionNum, intensity<-, mz<-, precScanNum, spectraData<-, spectraNames<-.

Retrieving compound annotations for spectra

While compound annotations are also provided via the spectraVariables of the backend, it would also be possible to use the compounds function on a Spectra object (that uses a MsBackendWeizMass backend) to retrieve compound annotations for the specific spectra.

References

Shahaf N., Rogachev I., Heinig U., Meir S, Malitsky S, Battat M. et al. (2016). The WEIZMASS spectra library for high-confidence metabolite identification. Nature Communications 7:12423. doi:10.1038/ncomms12423 .

Author

Johannes Rainer

Examples


## Create a connection to a database with WeizMass data - in the present
## example we connect to a tiny SQLite database bundled in this package.
library(RSQLite)
con <- dbConnect(SQLite(), system.file("sqlite", "weizmassv2.sqlite",
    package = "MsBackendWeizMass"))

## Given that we have the connection to a WeizMass database we can
## initialize the backend:
be <- backendInitialize(MsBackendWeizMass(), dbcon = con)
be
#> MsBackendWeizMass with 2 spectra
#>     msLevel precursorMz  polarity
#>   <integer>   <numeric> <integer>
#> 1        NA     595.166         1
#> 2        NA     593.150         0
#>  ... 45 more variables/columns.
#>  Use  'spectraVariables' to list all of them.

## List available peak variables
peaksVariables(be)
#> [1] "mz"                 "intensity"          "relative_intensity"
#> [4] "peak_annotation"   

## Get peaks data; by default only m/z and intensity values are returned
peaksData(be)
#> [[1]]
#>             mz intensity
#>  [1,] 325.0707       119
#>  [2,] 337.0707        60
#>  [3,] 355.0812        75
#>  [4,] 379.0812       134
#>  [5,] 380.0891        35
#>  [6,] 391.0812        59
#>  [7,] 403.0812        63
#>  [8,] 409.0918       130
#>  [9,] 421.0918        81
#> [10,] 427.1024       130
#> [11,] 428.1102        39
#> [12,] 439.1024       104
#> [13,] 457.1129       391
#> [14,] 458.1207       104
#> [15,] 475.1235       115
#> [16,] 476.1313        35
#> [17,] 481.1129       122
#> [18,] 482.1207        34
#> [19,] 499.1235        88
#> [20,] 505.1129        35
#> [21,] 511.1235       113
#> [22,] 523.1235       102
#> [23,] 529.1341        78
#> [24,] 541.1341       126
#> [25,] 542.1419        39
#> [26,] 559.1446       169
#> [27,] 560.1524        62
#> [28,] 577.1552       364
#> [29,] 578.1630       126
#> [30,] 579.1646        35
#> 
#> [[2]]
#>             mz intensity
#>  [1,] 297.0768        26
#>  [2,] 311.0561        22
#>  [3,] 325.0718        46
#>  [4,] 326.0796        12
#>  [5,] 335.0561        12
#>  [6,] 353.0667       660
#>  [7,] 354.0745       179
#>  [8,] 355.0741        57
#>  [9,] 365.0667        38
#> [10,] 383.0772       432
#> [11,] 384.0851       104
#> [12,] 397.0929        13
#> [13,] 413.0878        55
#> [14,] 414.0956        14
#> [15,] 425.0878        33
#> [16,] 426.0887        13
#> [17,] 437.0878        15
#> [18,] 455.0984        53
#> [19,] 456.1062        19
#> [20,] 473.1089       715
#> [21,] 474.1168       182
#> [22,] 475.1116        58
#> [23,] 485.1089        28
#> [24,] 503.1195       224
#> [25,] 504.1245        54
#> [26,] 515.1195        26
#> [27,] 533.1301        21
#> [28,] 545.1301        13
#> [29,] 575.1406        61
#> [30,] 576.1395        21
#> 

## Get peaks data including peak annotations; note that now for each
## spectrum a character matrix is returned!
res <- peaksData(be, columns = c("mz", "intensity", "peak_annotation"))
res[[1L]]
#>       mz         intensity peak_annotation
#>  [1,] "325.0707" "119"     "C18H12O6"     
#>  [2,] "337.0707" " 60"     "C19H12O6"     
#>  [3,] "355.0812" " 75"     "C19H14O7"     
#>  [4,] "379.0812" "134"     "C21H14O7"     
#>  [5,] "380.0891" " 35"     "C21H15O7"     
#>  [6,] "391.0812" " 59"     "C22H14O7"     
#>  [7,] "403.0812" " 63"     "C23H14O7"     
#>  [8,] "409.0918" "130"     "C22H16O8"     
#>  [9,] "421.0918" " 81"     "C23H16O8"     
#> [10,] "427.1024" "130"     "C22H18O9"     
#> [11,] "428.1102" " 39"     "C22H19O9"     
#> [12,] "439.1024" "104"     "C23H18O9"     
#> [13,] "457.1129" "391"     "C23H20O10"    
#> [14,] "458.1207" "104"     "C23H21O10"    
#> [15,] "475.1235" "115"     "C23H22O11"    
#> [16,] "476.1313" " 35"     "C23H23O11"    
#> [17,] "481.1129" "122"     "C25H20O10"    
#> [18,] "482.1207" " 34"     "C25H21O10"    
#> [19,] "499.1235" " 88"     "C25H22O11"    
#> [20,] "505.1129" " 35"     "C27H20O10"    
#> [21,] "511.1235" "113"     "C26H22O11"    
#> [22,] "523.1235" "102"     "C27H22O11"    
#> [23,] "529.1341" " 78"     "C26H24O12"    
#> [24,] "541.1341" "126"     "C27H24O12"    
#> [25,] "542.1419" " 39"     "C27H25O12"    
#> [26,] "559.1446" "169"     "C27H26O13"    
#> [27,] "560.1524" " 62"     "C27H27O13"    
#> [28,] "577.1552" "364"     "C27H28O14"    
#> [29,] "578.1630" "126"     "C27H29O14"    
#> [30,] "579.1646" " 35"     ""             

## Get the m/z values for all spectra
mz(be)
#> NumericList of length 2
#> [[1]] 325.070665 337.070665 355.081229 ... 577.155182 578.163007 579.16462
#> [[2]] 297.076847 311.056112 325.071762 ... 545.130064 575.140629 576.13949

## annotations for the invidual peaks can be retrieved with
be$peak_annotation
#> CharacterList of length 2
#> [[1]] C18H12O6 C19H12O6 C19H14O7 C21H14O7 ... C27H27O13 C27H28O14 C27H29O14 
#> [[2]] C17H14O5 C17H12O6 C18H14O6 C18H15O6 ... C25H26O13 C26H26O13 C27H28O14 

## List available spectra variables
spectraVariables(be)
#>  [1] "msLevel"                 "rtime"                  
#>  [3] "acquisitionNum"          "scanIndex"              
#>  [5] "mz"                      "intensity"              
#>  [7] "dataStorage"             "dataOrigin"             
#>  [9] "centroided"              "smoothed"               
#> [11] "polarity"                "precScanNum"            
#> [13] "precursorMz"             "precursorIntensity"     
#> [15] "precursorCharge"         "collisionEnergy"        
#> [17] "isolationWindowLowerMz"  "isolationWindowTargetMz"
#> [19] "isolationWindowUpperMz"  "precursor_mz_text"      
#> [21] "spectrumId"              "compound_id"            
#> [23] "ION"                     "adduct"                 
#> [25] "EXTRA_IONS"              "EXTRA_MZ"               
#> [27] "rtime_ci"                "UV"                     
#> [29] "CCS"                     "DATE"                   
#> [31] "formula"                 "exactmass"              
#> [33] "SOURCE"                  "LIBRARY"                
#> [35] "smiles"                  "inchikey"               
#> [37] "CHEMICAL_CLASS"          "CURATED_CHEMICAL_CLASS" 
#> [39] "ORGANISM_TYPE"           "CHEM_LOCATION"          
#> [41] "instrument"              "CHROMATOGRAPHY"         
#> [43] "ISOMER_OF"               "MSI"                    
#> [45] "common_name"             "iupac_name"             
#> [47] "relative_intensity"      "peak_annotation"        

## Access MS level
msLevel(be)
#> [1] NA NA
be$msLevel
#> [1] NA NA

## Access m/z values
be$mz
#> NumericList of length 2
#> [[1]] 325.070665 337.070665 355.081229 ... 577.155182 578.163007 579.16462
#> [[2]] 297.076847 311.056112 325.071762 ... 545.130064 575.140629 576.13949

## Access the full spectra data (including m/z and intensity values)
spectraData(be)
#> DataFrame with 2 rows and 48 columns
#>     msLevel     rtime acquisitionNum scanIndex                          mz
#>   <integer> <numeric>      <integer> <integer>               <NumericList>
#> 1        NA      7.06             NA        NA 325.071,337.071,355.081,...
#> 2        NA      7.09             NA        NA 297.077,311.056,325.072,...
#>         intensity dataStorage   dataOrigin centroided  smoothed  polarity
#>     <NumericList> <character>  <character>  <logical> <logical> <integer>
#> 1 119, 60, 75,...  <WeizMass> IL_270512_49         NA        NA         1
#> 2    26,22,46,...  <WeizMass> IL_060312_49         NA        NA         0
#>   precScanNum precursorMz precursorIntensity precursorCharge collisionEnergy
#>     <integer>   <numeric>          <numeric>       <integer>       <numeric>
#> 1          NA     595.166                 NA              NA              NA
#> 2          NA     593.150                 NA              NA              NA
#>   isolationWindowLowerMz isolationWindowTargetMz isolationWindowUpperMz
#>                <numeric>               <numeric>              <numeric>
#> 1                     NA                      NA                     NA
#> 2                     NA                      NA                     NA
#>   precursor_mz_text spectrumId compound_id         ION      adduct  EXTRA_IONS
#>         <character>  <integer> <character> <character> <character> <character>
#> 1          595.1658      32290   NP-000002         MSE        [M]+          NA
#> 2          593.1496      32291   NP-000002         MSE      [M-H]-          NA
#>      EXTRA_MZ    rtime_ci          UV         CCS       DATE     formula
#>   <character> <character> <character> <character>  <numeric> <character>
#> 1          NA   7.01;7.12          NA          NA 1643760000   C27H30O15
#> 2          NA   7.05;7.16          NA          NA 1643760000   C27H30O15
#>   exactmass      SOURCE     LIBRARY                 smiles
#>   <numeric> <character> <character>            <character>
#> 1   594.158       AALAB    WEIZMASS c1(c(c(c2c(c1C1OC(C(..
#> 2   594.158       AALAB    WEIZMASS c1(c(c(c2c(c1C1OC(C(..
#>                 inchikey         CHEMICAL_CLASS CURATED_CHEMICAL_CLASS
#>              <character>            <character>            <character>
#> 1 FIAAVMJLAGNUKW-UHFFF.. Phenylpropanoids and..                     NA
#> 2 FIAAVMJLAGNUKW-UHFFF.. Phenylpropanoids and..                     NA
#>   ORGANISM_TYPE   CHEM_LOCATION  instrument CHROMATOGRAPHY   ISOMER_OF
#>     <character>     <character> <character>    <character> <character>
#> 1         Plant AD111784-26_F11 Synapt_TOF1   RP_C18_40min          NA
#> 2         Plant AD111784-26_F11 Synapt_TOF1   RP_C18_40min          NA
#>         MSI             common_name                iupac_name
#>   <integer>         <CharacterList>           <CharacterList>
#> 1         1 ,5,7,4-Trihydroxyflav.. NA,5,7-dihydroxy-2-(4-h..
#> 2         1 ,5,7,4-Trihydroxyflav.. NA,5,7-dihydroxy-2-(4-h..
#>   relative_intensity                peak_annotation
#>        <NumericList>                <CharacterList>
#> 1       NA,NA,NA,... C18H12O6,C19H12O6,C19H14O7,...
#> 2       NA,NA,NA,... C17H14O5,C17H12O6,C18H14O6,...

## Add a new spectra variable
be$new_variable <- "b"
be$new_variable
#> [1] "b" "b"

## Subset the backend
be_sub <- be[c(2, 1)]

spectraNames(be)
#> [1] "32290" "32291"
spectraNames(be_sub)
#> [1] "32291" "32290"