R/MsBackendWeizMass-functions.R
, R/MsBackendWeizMass.R
MsBackendWeizMass.Rd
The MsBackendWeizMass
provides access to WeizMass mass spectrometry
libraries by directly accessing its MySQL/MariaDb database. In addition the
backend supports adding new spectra variables to the object, or to locally
change spectra variables (without changing the original values in the
database).
Note that MsBackendWeizMass
requires access to a WeizMass MySQL/MariaDB
database.
Also, some of the fields in the WeizMass database are not directly compatible
with Spectra
, as the data is stored as text instead of numeric. The
precursor m/z values are for example stored as character in the database, but
are converted to numeric during the data access. Thus, for spectra with
non-numeric values stored in that field an NA
is reported.
MsBackendWeizMass()
# S4 method for MsBackendWeizMass
backendInitialize(object, dbcon, ...)
# S4 method for MsBackendWeizMass
peaksVariables(object)
# S4 method for MsBackendWeizMass
peaksData(object, columns = c("mz", "intensity"))
# S4 method for MsBackendWeizMass
dataStorage(object)
# S4 method for MsBackendWeizMass
intensity(object) <- value
# S4 method for MsBackendWeizMass
mz(object) <- value
# S4 method for MsBackendWeizMass
reset(object)
# S4 method for MsBackendWeizMass
spectraData(object, columns = spectraVariables(object))
# S4 method for MsBackendWeizMass
spectraNames(object)
# S4 method for MsBackendWeizMass
spectraNames(object) <- value
# S4 method for MsBackendWeizMass
tic(object, initial = TRUE)
# S4 method for MsBackendWeizMass
[(x, i, j, ..., drop = FALSE)
# S4 method for MsBackendWeizMass
$(x, name) <- value
# S4 method for MsBackendWeizMass
precScanNum(object)
Object extending MsBackendWeizMass
.
For backendInitialize,MsBackendWeizMass
: SQL database
connection to the WeizMass database.
Additional arguments.
For spectraData
accessor: optional character
with column
names (spectra variables) that should be included in the
returned DataFrame
. By default, all columns are returned.
replacement value for <-
methods. See individual
method description or expected data type.
For tic
: logical(1)
whether the initially
reported total ion current should be reported, or whether the
total ion current should be (re)calculated on the actual data
(initial = FALSE
).
Object extending MsBackendWeizMass
.
For [
: integer
, logical
or character
to subset the object.
For [
: not supported.
For [
: not considered.
name of the variable to replace for <-
methods. See individual
method description or expected data type.
For selectSpectraVariables
: character
with the
names of the spectra variables to which the backend should be subsetted.
See documentation of respective function.
The following functions are supported by the MsBackendWeizMass
.
[
: subset the backend. Only subsetting by element (row/i
) is
allowed
$
, $<-
: access or set/add a single spectrum variable (column) in the
backend.
acquisitionNum
: returns the acquisition number of each
spectrum. Returns an integer
of length equal to the number of
spectra (with NA_integer_
if not available).
backendInitialize
: initialises the backend by retrieving the IDs of all
spectra in the database. Parameter dbcon
with the connection to the
WeizMass MySQL database is required.
dataOrigin
: gets a character
of length equal to the number of spectra
in object
with the data origin of each spectrum. This could e.g. be
the mzML file from which the data was read.
dataStorage
: returns "<WeizMass>"
for all spectra.
centroided
, centroided<-
: gets or sets the centroiding
information of the spectra. centroided
returns a logical
vector of length equal to the number of spectra with TRUE
if a
spectrum is centroided, FALSE
if it is in profile mode and NA
if it is undefined. See also isCentroided
for estimating from
the spectrum data whether the spectrum is centroided. value
for centroided<-
is either a single logical
or a logical
of
length equal to the number of spectra in object
.
collisionEnergy
, collisionEnergy<-
: gets or sets the
collision energy for all spectra in object
. collisionEnergy
returns a numeric
with length equal to the number of spectra
(NA_real_
if not present/defined), collisionEnergy<-
takes a
numeric
of length equal to the number of spectra in object
. Note that
the collision energy description from WeizMass are provided as spectra
variable "collisionEnergyText"
.
intensity
: gets the intensity values from the spectra. Returns
a NumericList()
of numeric
vectors (intensity values for each
spectrum). The length of the list
is equal to the number of
spectra
in object
.
ionCount
: returns a numeric
with the sum of intensities for
each spectrum. If the spectrum is empty (see isEmpty
),
NA_real_
is returned.
isCentroided
: a heuristic approach assessing if the spectra in
object
are in profile or centroided mode. The function takes
the qtl
th quantile top peaks, then calculates the difference
between adjacent m/z value and returns TRUE
if the first
quartile is greater than k
. (See Spectra:::.isCentroided
for
the code.)
isEmpty
: checks whether a spectrum in object
is empty
(i.e. does not contain any peaks). Returns a logical
vector of
length equal number of spectra.
isolationWindowLowerMz
, isolationWindowLowerMz<-
: gets or sets the
lower m/z boundary of the isolation window.
isolationWindowTargetMz
, isolationWindowTargetMz<-
: gets or sets the
target m/z of the isolation window.
isolationWindowUpperMz
, isolationWindowUpperMz<-
: gets or sets the
upper m/z boundary of the isolation window.
isReadOnly
: returns a logical(1)
whether the backend is read
only or does allow also to write/update data.
length
: returns the number of spectra in the object.
lengths
: gets the number of peaks (m/z-intensity values) per
spectrum. Returns an integer
vector (length equal to the
number of spectra). For empty spectra, 0
is returned.
msLevel
: gets the spectra's MS level. Returns an integer
vector (of length equal to the number of spectra) with the MS
level for each spectrum (or NA_integer_
if not available).
mz
: gets the mass-to-charge ratios (m/z) from the
spectra. Returns a NumericList()
or length equal to the number of
spectra, each element a numeric
vector with the m/z values of
one spectrum.
peaksData
returns a list
with the spectras' peak data. The length of
the list is equal to the number of spectra in object
. Each element of
the list is a matrix
with columns defined by parameter columns
which
defaults to columns = c("mz", "intensity")
but any of
peaksVariables(object)
would be supported.
Note that if columns
contains "peak_annotation"
, the whole matrix will
be of type character
(i.e. even the m/z and intensity values will be
provided as text). See examples below for details. For an empty spectrum,
a matrix
with 0 rows is returned.
peaksVariables
returns a character
with the provided peaks variables
(i.e. data available for each individual mass peak). These can be used in
peaksData
to retrieve the specified values.
polarity
, polarity<-
: gets or sets the polarity for each
spectrum. polarity
returns an integer
vector (length equal
to the number of spectra), with 0
and 1
representing negative
and positive polarities, respectively. polarity<-
expects an
integer vector of length 1 or equal to the number of spectra.
precursorCharge
, precursorIntensity
, precursorMz
,
precScanNum
, precAcquisitionNum
: get the charge (integer
),
intensity (numeric
), m/z (numeric
), scan index (integer
)
and acquisition number (interger
) of the precursor for MS level
2 and above spectra from the object. Returns a vector of length equal to
the number of spectra in object
. NA
are reported for MS1
spectra of if no precursor information is available.
reset
: restores the backend to its original state, i.e. deletes all
locally modified data and reinitializes the backend to the full data
available in the database.
rtime
, rtime<-
: gets or sets the retention times for each
spectrum (in seconds). rtime
returns a numeric
vector (length equal to
the number of spectra) with the retention time for each spectrum.
rtime<-
expects a numeric vector with length equal to the
number of spectra.
scanIndex
: returns an integer
vector with the scan index
for each spectrum. This represents the relative index of the
spectrum within each file. Note that this can be different to the
acquisitionNum
of the spectrum which is the index of the
spectrum as reported in the mzML file.
selectSpectraVariables
: reduces the information within the backend to
the selected spectra variables.
smoothed
,smoothed<-
: gets or sets whether a spectrum is
smoothed. smoothed
returns a logical
vector of length equal
to the number of spectra. smoothed<-
takes a logical
vector
of length 1 or equal to the number of spectra in object
.
spectraData
: gets general spectrum metadata (annotation, also called
header). spectraData
returns a DataFrame
. Note that replacing the
spectra data with spectraData<-
is not supported.
spectraNames
: returns a character
vector with the names of
the spectra in object
.
spectraVariables
: returns a character
vector with the
available spectra variables (columns, fields or attributes)
available in object
. This should return all spectra variables which
are present in object
, also "mz"
and "intensity"
(which are by
default not returned by the spectraVariables,Spectra
method).
tic
: gets the total ion current/count (sum of signal of a
spectrum) for all spectra in object
. By default, the value
reported in the original raw data file is returned. For an empty
spectrum, NA_real_
is returned.
The following functions are not supported by the MsBackendWeizMass
since
the original data can not be changed.
backendMerge
, export
, filterDataStorage
, filterPrecursorScan
,
peaksData<-
, filterAcquisitionNum
, intensity<-
, mz<-
, precScanNum
,
spectraData<-
, spectraNames<-
.
While compound annotations are also provided via the spectraVariables
of
the backend, it would also be possible to use the compounds
function on
a Spectra
object (that uses a MsBackendWeizMass
backend) to retrieve
compound annotations for the specific spectra.
Shahaf N., Rogachev I., Heinig U., Meir S, Malitsky S, Battat M. et al. (2016). The WEIZMASS spectra library for high-confidence metabolite identification. Nature Communications 7:12423. doi:10.1038/ncomms12423 .
## Create a connection to a database with WeizMass data - in the present
## example we connect to a tiny SQLite database bundled in this package.
library(RSQLite)
con <- dbConnect(SQLite(), system.file("sqlite", "weizmassv2.sqlite",
package = "MsBackendWeizMass"))
## Given that we have the connection to a WeizMass database we can
## initialize the backend:
be <- backendInitialize(MsBackendWeizMass(), dbcon = con)
be
#> MsBackendWeizMass with 2 spectra
#> msLevel precursorMz polarity
#> <integer> <numeric> <integer>
#> 1 NA 595.166 1
#> 2 NA 593.150 0
#> ... 45 more variables/columns.
#> Use 'spectraVariables' to list all of them.
## List available peak variables
peaksVariables(be)
#> [1] "mz" "intensity" "relative_intensity"
#> [4] "peak_annotation"
## Get peaks data; by default only m/z and intensity values are returned
peaksData(be)
#> [[1]]
#> mz intensity
#> [1,] 325.0707 119
#> [2,] 337.0707 60
#> [3,] 355.0812 75
#> [4,] 379.0812 134
#> [5,] 380.0891 35
#> [6,] 391.0812 59
#> [7,] 403.0812 63
#> [8,] 409.0918 130
#> [9,] 421.0918 81
#> [10,] 427.1024 130
#> [11,] 428.1102 39
#> [12,] 439.1024 104
#> [13,] 457.1129 391
#> [14,] 458.1207 104
#> [15,] 475.1235 115
#> [16,] 476.1313 35
#> [17,] 481.1129 122
#> [18,] 482.1207 34
#> [19,] 499.1235 88
#> [20,] 505.1129 35
#> [21,] 511.1235 113
#> [22,] 523.1235 102
#> [23,] 529.1341 78
#> [24,] 541.1341 126
#> [25,] 542.1419 39
#> [26,] 559.1446 169
#> [27,] 560.1524 62
#> [28,] 577.1552 364
#> [29,] 578.1630 126
#> [30,] 579.1646 35
#>
#> [[2]]
#> mz intensity
#> [1,] 297.0768 26
#> [2,] 311.0561 22
#> [3,] 325.0718 46
#> [4,] 326.0796 12
#> [5,] 335.0561 12
#> [6,] 353.0667 660
#> [7,] 354.0745 179
#> [8,] 355.0741 57
#> [9,] 365.0667 38
#> [10,] 383.0772 432
#> [11,] 384.0851 104
#> [12,] 397.0929 13
#> [13,] 413.0878 55
#> [14,] 414.0956 14
#> [15,] 425.0878 33
#> [16,] 426.0887 13
#> [17,] 437.0878 15
#> [18,] 455.0984 53
#> [19,] 456.1062 19
#> [20,] 473.1089 715
#> [21,] 474.1168 182
#> [22,] 475.1116 58
#> [23,] 485.1089 28
#> [24,] 503.1195 224
#> [25,] 504.1245 54
#> [26,] 515.1195 26
#> [27,] 533.1301 21
#> [28,] 545.1301 13
#> [29,] 575.1406 61
#> [30,] 576.1395 21
#>
## Get peaks data including peak annotations; note that now for each
## spectrum a character matrix is returned!
res <- peaksData(be, columns = c("mz", "intensity", "peak_annotation"))
res[[1L]]
#> mz intensity peak_annotation
#> [1,] "325.0707" "119" "C18H12O6"
#> [2,] "337.0707" " 60" "C19H12O6"
#> [3,] "355.0812" " 75" "C19H14O7"
#> [4,] "379.0812" "134" "C21H14O7"
#> [5,] "380.0891" " 35" "C21H15O7"
#> [6,] "391.0812" " 59" "C22H14O7"
#> [7,] "403.0812" " 63" "C23H14O7"
#> [8,] "409.0918" "130" "C22H16O8"
#> [9,] "421.0918" " 81" "C23H16O8"
#> [10,] "427.1024" "130" "C22H18O9"
#> [11,] "428.1102" " 39" "C22H19O9"
#> [12,] "439.1024" "104" "C23H18O9"
#> [13,] "457.1129" "391" "C23H20O10"
#> [14,] "458.1207" "104" "C23H21O10"
#> [15,] "475.1235" "115" "C23H22O11"
#> [16,] "476.1313" " 35" "C23H23O11"
#> [17,] "481.1129" "122" "C25H20O10"
#> [18,] "482.1207" " 34" "C25H21O10"
#> [19,] "499.1235" " 88" "C25H22O11"
#> [20,] "505.1129" " 35" "C27H20O10"
#> [21,] "511.1235" "113" "C26H22O11"
#> [22,] "523.1235" "102" "C27H22O11"
#> [23,] "529.1341" " 78" "C26H24O12"
#> [24,] "541.1341" "126" "C27H24O12"
#> [25,] "542.1419" " 39" "C27H25O12"
#> [26,] "559.1446" "169" "C27H26O13"
#> [27,] "560.1524" " 62" "C27H27O13"
#> [28,] "577.1552" "364" "C27H28O14"
#> [29,] "578.1630" "126" "C27H29O14"
#> [30,] "579.1646" " 35" ""
## Get the m/z values for all spectra
mz(be)
#> NumericList of length 2
#> [[1]] 325.070665 337.070665 355.081229 ... 577.155182 578.163007 579.16462
#> [[2]] 297.076847 311.056112 325.071762 ... 545.130064 575.140629 576.13949
## annotations for the invidual peaks can be retrieved with
be$peak_annotation
#> CharacterList of length 2
#> [[1]] C18H12O6 C19H12O6 C19H14O7 C21H14O7 ... C27H27O13 C27H28O14 C27H29O14
#> [[2]] C17H14O5 C17H12O6 C18H14O6 C18H15O6 ... C25H26O13 C26H26O13 C27H28O14
## List available spectra variables
spectraVariables(be)
#> [1] "msLevel" "rtime"
#> [3] "acquisitionNum" "scanIndex"
#> [5] "mz" "intensity"
#> [7] "dataStorage" "dataOrigin"
#> [9] "centroided" "smoothed"
#> [11] "polarity" "precScanNum"
#> [13] "precursorMz" "precursorIntensity"
#> [15] "precursorCharge" "collisionEnergy"
#> [17] "isolationWindowLowerMz" "isolationWindowTargetMz"
#> [19] "isolationWindowUpperMz" "precursor_mz_text"
#> [21] "spectrumId" "compound_id"
#> [23] "ION" "adduct"
#> [25] "EXTRA_IONS" "EXTRA_MZ"
#> [27] "rtime_ci" "UV"
#> [29] "CCS" "DATE"
#> [31] "formula" "exactmass"
#> [33] "SOURCE" "LIBRARY"
#> [35] "smiles" "inchikey"
#> [37] "CHEMICAL_CLASS" "CURATED_CHEMICAL_CLASS"
#> [39] "ORGANISM_TYPE" "CHEM_LOCATION"
#> [41] "instrument" "CHROMATOGRAPHY"
#> [43] "ISOMER_OF" "MSI"
#> [45] "common_name" "iupac_name"
#> [47] "relative_intensity" "peak_annotation"
## Access MS level
msLevel(be)
#> [1] NA NA
be$msLevel
#> [1] NA NA
## Access m/z values
be$mz
#> NumericList of length 2
#> [[1]] 325.070665 337.070665 355.081229 ... 577.155182 578.163007 579.16462
#> [[2]] 297.076847 311.056112 325.071762 ... 545.130064 575.140629 576.13949
## Access the full spectra data (including m/z and intensity values)
spectraData(be)
#> DataFrame with 2 rows and 48 columns
#> msLevel rtime acquisitionNum scanIndex mz
#> <integer> <numeric> <integer> <integer> <NumericList>
#> 1 NA 7.06 NA NA 325.071,337.071,355.081,...
#> 2 NA 7.09 NA NA 297.077,311.056,325.072,...
#> intensity dataStorage dataOrigin centroided smoothed polarity
#> <NumericList> <character> <character> <logical> <logical> <integer>
#> 1 119, 60, 75,... <WeizMass> IL_270512_49 NA NA 1
#> 2 26,22,46,... <WeizMass> IL_060312_49 NA NA 0
#> precScanNum precursorMz precursorIntensity precursorCharge collisionEnergy
#> <integer> <numeric> <numeric> <integer> <numeric>
#> 1 NA 595.166 NA NA NA
#> 2 NA 593.150 NA NA NA
#> isolationWindowLowerMz isolationWindowTargetMz isolationWindowUpperMz
#> <numeric> <numeric> <numeric>
#> 1 NA NA NA
#> 2 NA NA NA
#> precursor_mz_text spectrumId compound_id ION adduct EXTRA_IONS
#> <character> <integer> <character> <character> <character> <character>
#> 1 595.1658 32290 NP-000002 MSE [M]+ NA
#> 2 593.1496 32291 NP-000002 MSE [M-H]- NA
#> EXTRA_MZ rtime_ci UV CCS DATE formula
#> <character> <character> <character> <character> <numeric> <character>
#> 1 NA 7.01;7.12 NA NA 1643760000 C27H30O15
#> 2 NA 7.05;7.16 NA NA 1643760000 C27H30O15
#> exactmass SOURCE LIBRARY smiles
#> <numeric> <character> <character> <character>
#> 1 594.158 AALAB WEIZMASS c1(c(c(c2c(c1C1OC(C(..
#> 2 594.158 AALAB WEIZMASS c1(c(c(c2c(c1C1OC(C(..
#> inchikey CHEMICAL_CLASS CURATED_CHEMICAL_CLASS
#> <character> <character> <character>
#> 1 FIAAVMJLAGNUKW-UHFFF.. Phenylpropanoids and.. NA
#> 2 FIAAVMJLAGNUKW-UHFFF.. Phenylpropanoids and.. NA
#> ORGANISM_TYPE CHEM_LOCATION instrument CHROMATOGRAPHY ISOMER_OF
#> <character> <character> <character> <character> <character>
#> 1 Plant AD111784-26_F11 Synapt_TOF1 RP_C18_40min NA
#> 2 Plant AD111784-26_F11 Synapt_TOF1 RP_C18_40min NA
#> MSI common_name iupac_name
#> <integer> <CharacterList> <CharacterList>
#> 1 1 ,5,7,4-Trihydroxyflav.. NA,5,7-dihydroxy-2-(4-h..
#> 2 1 ,5,7,4-Trihydroxyflav.. NA,5,7-dihydroxy-2-(4-h..
#> relative_intensity peak_annotation
#> <NumericList> <CharacterList>
#> 1 NA,NA,NA,... C18H12O6,C19H12O6,C19H14O7,...
#> 2 NA,NA,NA,... C17H14O5,C17H12O6,C18H14O6,...
## Add a new spectra variable
be$new_variable <- "b"
be$new_variable
#> [1] "b" "b"
## Subset the backend
be_sub <- be[c(2, 1)]
spectraNames(be)
#> [1] "32290" "32291"
spectraNames(be_sub)
#> [1] "32291" "32290"