The MsBackendPy
allows to access MS data stored as matchms.Spectrum
or spectrum_utils.spectrum.MsmsSpectrum
objects from the
matchms respectively
spectrum_utils Python
library directly from R. The MS data (peaks data or spectra variables) are
translated on-the-fly when accessed. Thus, the MsBackendPy
allows a
seamless integration of Python MS data structures into Spectra::Spectra()
based analysis workflows.
The MsBackendPy
object is considered read-only, i.e. it does not provide
functionality to replace the peaks data from R. However, it is possible to
directly change the data in the referenced Python variable.
Usage
# S4 method for class 'MsBackendPy'
backendInitialize(
object,
pythonVariableName = character(),
spectraVariableMapping = defaultSpectraVariableMapping(),
pythonLibrary = c("matchms", "spectrum_utils"),
...,
data
)
# S4 method for class 'MsBackendPy'
length(x)
# S4 method for class 'MsBackendPy'
spectraVariables(object)
# S4 method for class 'MsBackendPy'
spectraData(object, columns = spectraVariables(object), drop = FALSE)
# S4 method for class 'MsBackendPy'
peaksData(object, columns = c("mz", "intensity"), drop = FALSE)
# S4 method for class 'MsBackendPy'
x$name
# S4 method for class 'MsBackendPy'
spectraVariableMapping(object) <- value
# S4 method for class 'Spectra'
spectraVariableMapping(object) <- value
reindex(object)
Arguments
- object
A
MsBackendPy
object.- pythonVariableName
For
backendInitialize()
:character(1)
with the name of the variable/Python attribute that contains the list ofmatchms.Spectrum
objects with the MS data.- spectraVariableMapping
For
backendInitialize()
: namedcharacter
with the mapping between spectra variable names and (matchms.Spectrum
) metadata names. SeedefaultSpectraVariableMapping()
for more information and details.- pythonLibrary
For
backendInitialize()
:character(1)
specifying the Python library used to represent the MS data in Python. Can be eitherpythonLibrary = "matchms"
(the default) orpythonLibrary = "spectrum_utils"
.- ...
Additional parameters.
- data
For
backendInitialize()
:DataFrame
with the full MS data (peaks data and spectra data). Currently not supported.- x
A
MsBackendPy
object- columns
For
spectraData()
:character
with the names of columns (spectra variables) to retrieve. Defaults tospectraVariables(object)
. ForpeaksData()
:character
with the names of the peaks variables to retrieve.- drop
For
spectraData()
andpeaksData()
:logical(1)
whether, when a single column is requested, the data should be returned as avector
instead of adata.frame
ormatrix
.- name
For
$
:character(1)
with the name of the variable to retrieve.- value
Replacement value(s).
Details
The MsBackendPy
keeps only a reference to the MS data in Python (i.e. the
name of the variable in Python) as well as an index pointing to the
individual spectra in Python but no other data. Any data requested from
the MsBackendPy
is accessed and translated on-the-fly from the Python
variable. The MsBackendPy
is thus an interface to the MS data, but not
a data container. All changes to the MS data in the Python variable
(performed e.g. in Python) immediately affect any MsBackendPy
instances
pointing to this variable.
Special care must be taken if the MS data structure in Python is subset or
its order is changed (e.g. by another process). In that case it might be
needed to re-index the backend using the reindex()
function:
object <- reindex(object)
. This will update (replace) the index to the
individual spectra in Python which is stored within the backend.
Note
As mentioned in the details section the MS data is completely stored in
Python and the backend only references to this data through the name of
the variable in Python. Thus, each time MS data is requested from the
backend, it is retrieved in its current state.
If for example data was transformed or metadata added or removed in the
Python object, it immediately affects the Spectra
/backend.
MsBackendPy
methods
The MsBackendPy
supports all methods defined by the Spectra::MsBackend()
interface for access to MS data. Details on the invidual functions can also
be found in the main documentation in the Spectra package (i.e. for
Spectra::MsBackend()
). Here we provide information for functions with
specific properties of the backend.
backendInitialize()
: initializes the backend with information from the referenced Python variable (attribute). The name of this attribute, ideally stored in the associated Python session, is expected to be provided with thepythonVariableName
parameter. The optionalspectraVariableMapping
parameter allows to provide additional, or alternative, mapping ofSpectra
's spectra variables to metadata in thematchms.Spectrum
objects. SeedefaultSpectraVariableMapping()
(the default) for more information. ParameterpythonLibrary
must be used to specify the Python library representing the MS data in Python. It can be eitherpythonLibrary = "matchms"
(the default) orpythonLibrary = "spectrum_utils"
. The function returns an initialized instance ofMsBackendPy
.peaksData()
: extracts the peaks data matrices from the backend. Python code is applied to the data structure in Python to extract the m/z and intensity values as a list of (numpy) arrays. These are then translated into an Rlist
of two-columnnumeric
matrices. Because Python does not allow to name columns of an array, an additional loop in R is required to set the column names to"mz"
and"intensity"
.spectraData()
: extracts the spectra data from the backend. Which spectra variables are translated and retrieved from the Python objects depends on the backend'sspectraVariableMapping()
. All metadata names defined are retrieved and added to the returnedDataFrame
(with eventually missing core spectra variables filled withNA
).spectraVariables()
: retrieves available spectra variables, which include the names of all metadata attributes in thematchms.Spectrum
objects and the core spectra variablesSpectra::coreSpectraVariables()
.spectraVariableMapping<-
: replaces thespectraVariableMapping
of the backend (seesetSpectraVariableMapping()
for details and description of the expected format).
Additional helper and utility functions
reindex()
: update the internal index to match1:length(object)
. This function is useful if the original data referenced by the backend was subset or re-ordered by a different process (or a function in Python).
Examples
## Loading an example MGF file provided by the SpectriPy package.
## As an alternative, the data could also be imported directly in Python
## using:
## import matchms
## from matchms.importing import load_from_mgf
## s_p = list(load_from_mgf(r.fl))
library(Spectra)
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: generics
#>
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#>
#> as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#> setequal, union
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#> mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#> rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#> unsplit, which.max, which.min
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: BiocParallel
library(MsBackendMgf)
fl <- system.file("extdata", "mgf", "test.mgf", package = "SpectriPy")
s <- Spectra(fl, source = MsBackendMgf())
#> Start data import from 1 files ...
#> done
s
#> MSn data (Spectra) with 100 spectra in a MsBackendMgf backend:
#> msLevel rtime scanIndex
#> <integer> <numeric> <integer>
#> 1 2 NA NA
#> 2 2 NA NA
#> 3 2 NA NA
#> 4 2 NA NA
#> 5 2 NA NA
#> ... ... ... ...
#> 96 2 NA NA
#> 97 2 NA NA
#> 98 2 NA NA
#> 99 2 NA NA
#> 100 NA NA NA
#> ... 20 more variables/columns.
## Translating the MS data to Python and assigning it to a variable
## named "s_p" in the (*reticulate*'s) `py` Python environment. Assigning
## the variable to the Python environment has performance advantages, as
## any Python code applied to the MS data does not require any data
## conversions.
py_set_attr(py, "s_p", rspec_to_pyspec(s))
## Create a `MsBackendPy` representing an interface to the data in the
## "s_p" variable in Python:
be <- backendInitialize(MsBackendPy(), "s_p")
be
#> MsBackendPy with 100 spectra
#> Data stored in the "s_p" variable in Python
## Create a Spectra object which this backend:
s_2 <- Spectra(be)
s_2
#> MSn data (Spectra) with 100 spectra in a MsBackendPy backend:
#> Data stored in the "s_p" variable in Python
## Available spectra variables: these include, next to the *core* spectra
## variables, also the names of all metadata stored in the `matchms.Spectrum`
## objects.
spectraVariables(s_2)
#> [1] "msLevel" "rtime"
#> [3] "acquisitionNum" "scanIndex"
#> [5] "dataStorage" "dataOrigin"
#> [7] "centroided" "smoothed"
#> [9] "polarity" "precScanNum"
#> [11] "precursorMz" "precursorIntensity"
#> [13] "precursorCharge" "collisionEnergy"
#> [15] "isolationWindowLowerMz" "isolationWindowTargetMz"
#> [17] "isolationWindowUpperMz"
## Get the full peaks data:
peaksData(s_2)
#> List of length 100
## Get the peaks from the first spectrum
peaksData(s_2)[[1L]]
#> mz intensity
#> [1,] 213.0546 754969.6
#> [2,] 241.0495 1058878.8
#> [3,] 259.0601 20211204.0
## Get the full spectra data:
spectraData(s_2)
#> DataFrame with 100 rows and 17 columns
#> msLevel rtime acquisitionNum scanIndex dataStorage dataOrigin
#> <integer> <numeric> <integer> <integer> <character> <character>
#> 1 2 NA NA NA s_p NA
#> 2 2 NA NA NA s_p NA
#> 3 2 NA NA NA s_p NA
#> 4 2 NA NA NA s_p NA
#> 5 2 NA NA NA s_p NA
#> ... ... ... ... ... ... ...
#> 96 2 NA NA NA s_p NA
#> 97 2 NA NA NA s_p NA
#> 98 2 NA NA NA s_p NA
#> 99 2 NA NA NA s_p NA
#> 100 NA NA NA NA s_p NA
#> centroided smoothed polarity precScanNum precursorMz precursorIntensity
#> <logical> <logical> <integer> <integer> <numeric> <numeric>
#> 1 NA NA NA NA 259.060 NA
#> 2 NA NA NA NA 837.532 NA
#> 3 NA NA NA NA 250.180 NA
#> 4 NA NA NA NA 250.180 NA
#> 5 NA NA NA NA 250.180 NA
#> ... ... ... ... ... ... ...
#> 96 NA NA NA NA 301.144 NA
#> 97 NA NA NA NA 407.135 NA
#> 98 NA NA NA NA 304.192 NA
#> 99 NA NA NA NA 361.202 NA
#> 100 NA NA NA NA NA NA
#> precursorCharge collisionEnergy isolationWindowLowerMz
#> <integer> <numeric> <numeric>
#> 1 1 NA NA
#> 2 1 NA NA
#> 3 1 NA NA
#> 4 1 NA NA
#> 5 1 NA NA
#> ... ... ... ...
#> 96 1 NA NA
#> 97 1 NA NA
#> 98 1 NA NA
#> 99 1 NA NA
#> 100 1 NA NA
#> isolationWindowTargetMz isolationWindowUpperMz
#> <numeric> <numeric>
#> 1 NA NA
#> 2 NA NA
#> 3 NA NA
#> 4 NA NA
#> 5 NA NA
#> ... ... ...
#> 96 NA NA
#> 97 NA NA
#> 98 NA NA
#> 99 NA NA
#> 100 NA NA
## Get the m/z values
mz(s_2)
#> NumericList of length 100
#> [[1]] 213.0546 241.0495 259.0601
#> [[2]] 116.0712 116.1074 127.0759 142.1231 ... 398.2568 679.4443 680.4436
#> [[3]] 232.1689 250.18 251.1831 252.1852
#> [[4]] 145.0638 187.1102 232.1687 233.1721 250.1795 251.1824 252.1862
#> [[5]] 232.1689 250.18 251.1831 252.1852
#> [[6]] 145.0638 187.1102 232.1687 233.1721 250.1795 251.1824 252.1862
#> [[7]] 232.1689 250.18 251.1831 252.1852
#> [[8]] 145.0638 187.1102 232.1687 233.1721 250.1795 251.1824 252.1862
#> [[9]] 135.0432 138.0632 163.0375 195.088
#> [[10]] 55.0537 56.0487 58.0281 58.0644 ... 71.0493 72.0792 82.0652 100.0763
#> ...
#> <90 more elements>
## Plot the first spectrum
plotSpectra(s_2[1L])
########
## Using the spectrum_utils Python library
## Below we convert the data to a list of `MsmsSpectrum` object from the
## spectrum_utils library.
py_set_attr(py, "su_p", rspec_to_pyspec(s,
spectraVariableMapping("spectrum_utils"), "spectrum_utils"))
## Create a MsBackendPy representing this data. Importantly, we need to
## specify the Python library using the `pythonLibrary` parameter and
## ideally also set the `spectraVariableMapping` to the one specific for
## that library.
be <- backendInitialize(MsBackendPy(), "su_p",
spectraVariableMapping = spectraVariableMapping("spectrum_utils"),
pythonLibrary = "spectrum_utils")
be
#> MsBackendPy with 100 spectra
#> Data stored in the "su_p" variable in Python
## Get the peaks data for the first 3 spectra
peaksData(be[1:3])
#> [[1]]
#> mz intensity
#> [1,] 213.0546 754969.6
#> [2,] 241.0495 1058878.8
#> [3,] 259.0601 20211204.0
#>
#> [[2]]
#> mz intensity
#> [1,] 116.0712 15660
#> [2,] 116.1074 28164
#> [3,] 127.0759 8616
#> [4,] 142.1231 4516
#> [5,] 158.1184 855092
#> [6,] 159.1218 68248
#> [7,] 380.2469 6148
#> [8,] 398.2568 8996
#> [9,] 679.4443 10596
#> [10,] 680.4436 5176
#>
#> [[3]]
#> mz intensity
#> [1,] 232.1689 20176
#> [2,] 250.1800 2073248
#> [3,] 251.1831 282228
#> [4,] 252.1852 21640
#>
## Get the full spectraData
spectraData(be)
#> Warning: NAs introduced by coercion
#> DataFrame with 100 rows and 19 columns
#> msLevel rtime acquisitionNum scanIndex mz
#> <integer> <numeric> <integer> <integer> <NumericList>
#> 1 NA NA NA NA 213.055,241.049,259.060
#> 2 NA NA NA NA 116.071,116.107,127.076,...
#> 3 NA NA NA NA 232.169,250.180,251.183,...
#> 4 NA NA NA NA 145.064,187.110,232.169,...
#> 5 NA NA NA NA 232.169,250.180,251.183,...
#> ... ... ... ... ... ...
#> 96 NA NA NA NA 108.023,109.028,122.037,...
#> 97 NA NA NA NA 277.099,289.147,291.149,...
#> 98 NA NA NA NA 168.139,168.177,168.230,...
#> 99 NA NA NA NA 106.765,116.930,125.058,...
#> 100 NA NA NA NA 100,101,122,...
#> intensity dataStorage dataOrigin centroided smoothed
#> <NumericList> <character> <character> <logical> <logical>
#> 1 754970, 1058879,20211204 su_p NA NA NA
#> 2 15660,28164, 8616,... su_p NA NA NA
#> 3 20176,2073248, 282228,... su_p NA NA NA
#> 4 864, 2548,10716,... su_p NA NA NA
#> 5 20176,2073248, 282228,... su_p NA NA NA
#> ... ... ... ... ... ...
#> 96 224, 295,1801,... su_p NA NA NA
#> 97 21,26,27,... su_p NA NA NA
#> 98 24842, 882, 182,... su_p NA NA NA
#> 99 39, 51,108,... su_p NA NA NA
#> 100 47,153, 39,... su_p NA NA NA
#> polarity precScanNum precursorMz precursorIntensity precursorCharge
#> <integer> <integer> <numeric> <numeric> <integer>
#> 1 NA NA 259.060 NA 1
#> 2 NA NA 837.532 NA 1
#> 3 NA NA 250.180 NA 1
#> 4 NA NA 250.180 NA 1
#> 5 NA NA 250.180 NA 1
#> ... ... ... ... ... ...
#> 96 NA NA 301.144 NA 1
#> 97 NA NA 407.135 NA 1
#> 98 NA NA 304.192 NA 1
#> 99 NA NA 361.202 NA 1
#> 100 NA NA NA NA 1
#> collisionEnergy isolationWindowLowerMz isolationWindowTargetMz
#> <numeric> <numeric> <numeric>
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> ... ... ... ...
#> 96 NA NA NA
#> 97 NA NA NA
#> 98 NA NA NA
#> 99 NA NA NA
#> 100 NA NA NA
#> isolationWindowUpperMz
#> <numeric>
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> ... ...
#> 96 NA
#> 97 NA
#> 98 NA
#> 99 NA
#> 100 NA
## Extract the precursor m/z
be$precursorMz
#> [1] 259.0595 837.5318 250.1802 250.1802 250.1802 250.1802 250.1802 250.1802
#> [9] 195.0877 100.0757 154.0777 190.1352 175.1190 287.0574 83.0604 83.0604
#> [17] 195.0877 NA NA 780.5500 770.5700 179.1179 195.0877 195.0877
#> [25] 195.0877 425.1871 285.1809 224.0728 202.0854 161.0220 296.1412 239.1503
#> [33] 990.9768 239.0673 163.0314 303.1339 NA NA NA NA
#> [41] NA NA NA NA NA NA NA NA
#> [49] NA NA 296.1467 205.1911 317.2122 216.0000 207.0000 125.0000
#> [57] 164.0000 256.0000 138.0000 256.0000 436.9460 866.6100 254.2478 335.0512
#> [65] 257.1285 239.0815 239.0815 239.0815 301.1445 301.1445 302.3054 175.1190
#> [73] 175.1190 NA 497.3272 175.1189 175.1189 230.0958 350.1598 NA
#> [81] NA 611.1612 273.1386 385.2122 449.1078 447.0933 591.1719 339.2067
#> [89] 878.5001 NA 102.0600 465.4100 459.2748 NA 301.1445 301.1445
#> [97] 407.1347 304.1918 361.2020 NA