ChromBackend
classes
for Chromatogramsvignettes/creating-backend-classes.Rmd
creating-backend-classes.Rmd
Package: Chromatograms
Authors: Laurent Gatto [aut] (https://orcid.org/0000-0002-1520-2268), Johannes Rainer
[aut, cre] (https://orcid.org/0000-0002-6977-7147), Philippine
Louail [aut] (https://orcid.org/0009-0007-5429-6846)
Compiled: Wed Mar 27 08:08:52 2024
Similar to the Spectra
package, the Chromatograms
also separates the user-faced functionality to process and analyze
chromatographic mass spectrometry (MS) data from the code for storage
and representation of the data. The latter functionality is
provided by implementations of the ChromBackend
class,
further on called backends. This vignette describes the
ChromBackend
class and illustrates on a simple example how
a backend extending this class could be implemented.
Contributions to this vignette (content or correction of typos) or requests for additional details and information are highly welcome, ideally via pull requests or issues on the package’s github repository.
ChromBackend
?
The purpose of a backend class extending the virtual
ChromBackend
is to provide the chromatographic MS data to
the Chromatograms
object, which is used by the user to
interact with - and analyze the data. The ChromBackend
defines the API that new backends need to provide so that they can be
used with Chromatograms
. This API defines a set of methods
to access the data. For many functions default implementations exist and
a dedicated implementation for a new backend is only needed if necessary
(e.g. if the data is stored in a way that a different access to it would
be better). In addition, a core set of variables (data fields), the so
called core chromatogram variables, is defined to describe the
chromatographic data. Each backend needs to provide these, but can in
also define additional data fields. Before implementing a new backend it
is highly suggested to carefully read the following Conventions and
definitions section.
General conventions for chromatographic MS data of a
Chromatograms
are:
Chromatograms
object is designed to contain
multiple chromatographic data (not data from a single
chromatogram).NA
) for retention time values are not
supported.coreChromVariables()
function.dataStorage
and dataOrigin
are two special
variables that define for each chromatogram where the data is
(currently) stored and from where the data derived, respectively. Both
are expected to be of typecharacter
. Missing values for
dataStorage
are not allowed.ChromBackend
implementations can also represent purely
read-only data resources. In this case only data accessor
methods need to be implemented but not data replacement methods
(i.e. <-
methods that would allow to add or set
variables. Read-only backends should implement the
isReadOnly()
method, that should then return
TRUE
. Note that backends for purely read-only resources
could also implement a caching mechanism to (temporarily) store
changes to the data locally within the object (and hence in memory). See
information on the MsBackendCached
in the Spectra
package for more details.For parallel processing, Chromatograms
splits the
backend based on a defined factor
and processes each in
parallel (or in serial if a SerialParam
is used).
The splitting factor
can be defined for
Chromatograms
by setting the parameter
processingChunkSize
. Alternatively, through the
backendParallelFactor()
method the backend can also
suggest a factor
that should/could be used for
splitting and parallel processing. The default implementation for
backendParallelFactor()
is to return an empty
factor
(factor()
) hence not suggesting any
preferred splitting.
Besides parallel processing, for on-disk backends (i.e., backends that don’t keep all of the data in memory), this chunk-wise processing can also reduce the memory demand for operations, because only the peak data of the current chunk needs to be realized in memory.
The ChromBackend
class defines core methods that have to
be implemented by a MS backend as well as optional
methods for which a default implementation is already available. These
functions are described in sections Required methods and
Optional methods, respectively.
To create a new backend a class extending the virtual
ChromBackend
needs to be implemented. In the example below
we create thus a simple class with a data.frame
for general
properties (chromatogram variables) and two slots for the
retention time and intensity values, representing the actual
chromatographic MS data. We store these values as list
,
each list element representing values for one chromatogram, since the
number of values (peaks) can be different between
chromatograms. We also define a simple constructor function that returns
an empty instance of our new class.
library(Chromatograms)
#' Definition of the backend class extending ChromBackend
setClass("ChromBackendTest",
contains = "ChromBackend",
slots = c(
chromVars = "data.frame",
rtime = "list",
intensity = "list"
),
prototype = prototype(
spectraVars = data.frame(),
rtime = list(),
intensity = list()
))
#' Simple constructor function
ChromBackendTest <- function() {
new("ChromBackendTest")
}
The 3 slots @chromVars
, @rtime
and
@intensity
will be used to store our MS data: each row in
chromVars
will contain data for one chromatogram with the
columns being the different chromatogram variables
(i.e. additional properties of a chromatogram such as its m/z value or
MS level) and each element in @rtime
and
@intensity
a numeric
vector with the retention
times and intensity values representing thus the peaks data of
the respective chromatogram. This is only one of the possibly many ways
chromatographic data might be represented.
We should ideally also add some basic validity function that ensures
the data to be correct (valid). The function below simply checks that
the number of rows of the @chromVars
slot matches the
length of the @rtime
and @intensity
slots.
#' Basic validation function
setValidity("ChromBackendTest", function(object) {
if (length(object@rtime) != length(object@intensity) ||
length(object@rtime) != nrow(object@chromVars))
return("length of 'rtime' and 'intensity' has to match the number of ",
"rows of 'chromVars'")
NULL
})
## Class "ChromBackendTest" [in ".GlobalEnv"]
##
## Slots:
##
## Name: chromVars rtime intensity version
## Class: data.frame list list character
##
## Extends: "ChromBackend"
We can now create an instance of our new class with the
ChromBackendTest()
function.
#' Create an empty instance of ChromBackendTest
be <- ChromBackendTest()
be
## An object of class "ChromBackendTest"
## Slot "chromVars":
## data frame with 0 columns and 0 rows
##
## Slot "rtime":
## list()
##
## Slot "intensity":
## list()
##
## Slot "version":
## [1] "0.1"
A show()
method would allow for a more convenient way
how general information of our object is displayed. Below we add an
implementation of the show()
method.
#' implementation of show for ChromBackendTest
setMethod("show", "ChromBackendTest", function(object) {
cd <- object@chromVars
cat(class(object), "with", nrow(cd), "chromatograms\n")
})
be
## ChromBackendTest with 0 chromatograms
Methods listed in this section must be implemented
for a new class extending ChromBackend
. Methods should
ideally also be implemented in the order they are listed here. Also, it
is strongly advised to write dedicated unit tests for each newly
implemented method or function already during the
development.
dataStorage()
The dataStorage
chromatogram variable provides
information how or where the data is stored. The
dataStorage()
method should therefore return a
character
vector of length equal to the number of
chromatograms that are represented by the object. The values for
dataStorage
can be any character value, except
NA
. For our example backend we define a simple
dataStorage()
method that simply returns the column
"dataStorage"
from the @chromVars
(as a
character
).
#' dataStorage method to provide information *where* data is stored
setMethod("dataStorage", "ChromBackendTest", function(object) {
as.character(object@chromVars$dataStorage)
})
Calling dataStorage()
on our example backend will thus
return an empty character
(since the object created above
does not contain any data).
dataStorage(be)
## character(0)
length()
length()
is expected to return an integer
of length 1 with the total number of chromatograms that are represented
by the backend. For our example backend we simply return the number of
rows of the data.frame
stored in the
@chromVars
slot.
#' length to provide information on the number of chromatograms
setMethod("length", "ChromBackendTest", function(x) {
nrow(x@chromVars)
})
length(be)
## [1] 0
backendInitialize()
The backendInitialize()
method is expected to be called
after creating an instance of the backend class and should prepare
(initialize) the backend with data. This method can take any parameters
needed by the backend to get loaded/initialized with data (which can be
file names from which to load the data, a database connection or
object(s) containing the data). During backendInitialize()
it is also suggested to set the special spectra variables
dataStorage
and dataOrigin
are set.
Below we define a backendInitialize()
method that takes
as arguments a data.frame
with spectra variables and two
list
s with the retention time and intensity values for each
spectrum.
#' backendInitialize method to fill the backend with data.
setMethod(
"backendInitialize", "ChromBackendTest",
function(object, chromVars, rtime, intensity) {
if (!is.data.frame(chromVars))
stop("'chromVars' needs to be a 'data.frame' with the general",
"chromatogram variables")
## Defining dataStorage and dataOrigin, if not available
if (is.null(chromVars$dataStorage))
chromVars$dataStorage <- "<memory>"
if (is.null(chromVars$dataOrigin))
chromVars$dataOrigin <- "<user provided>"
object@chromVars <- chromVars
object@rtime <- rtime
object@intensity <- intensity
validObject(object)
object
})
In addition to adding the data to object, the function also defined
the dataStorage
and dataOrigin
spectra
variables. The purpose of these two variables is to provide some
information on where the data is currently stored (in memory as
in our example) and from where the data is originating.
We can now create an instance of our backend class and fill it with
data. We thus first define our MS data and pass this to the
backendInitialize()
method.
#' A data.frame with chromatogram variables.
cvars <- data.frame(msLevel = c(1L, 1L, 1L),
mz = c(112.2, 123.3, 134.4))
#' retention time values for each chromatogram.
rts <- list(c(12.4, 12.8, 13.2, 14.6),
c(45.1, 46.2),
c(64.4, 64.8, 65.2))
#' intensity values for each chromatogram.
ints <- list(c(123.3, 153.6, 2354.3, 243.4),
c(100, 80.1),
c(12.3, 135.2, 100))
#' Create and initialize the backend
be <- backendInitialize(ChromBackendTest(),
chromVars = cvars, rtime = rts, intensity = ints)
be
## ChromBackendTest with 3 chromatograms
While this method works and is compliant with the
MsBackend
API (because there is no requirement on the input
parameters for the backendInitialize()
method), it would be
good practice for backends to support an additional parameter
data
that would allow passing the complete MS data
(including retention time and intensity values) to the function as a
DataFrame
. This would simplify the implementation of some
replacement methods and would in addition also allow to change the
backend of a Chromatograms
using the
setBackend()
function to our new backend. Also, it is
highly suggested to check the validity of the input data within the
initialize method. The advantage of performing these validity checks in
backendInitialize()
over adding them with
setValidity()
is that eventually computationally expensive
operations/checks would only performed once instead of each time values
within the object are changed (e.g. by subsetting or similar), which
would be the case with validation functionality registered with
setValidity()
.
We thus re-implement the backendInitialize()
method
supporting also the data
parameter mentioned above and add
additional validity checks. These validity checks verify that only
numeric values are provided with rtime
and
intensity
, that the number of retention time and intensity
values matches for each chromatogram. We also use the
validChromData()
function that checks that provided core
chromatogram variables have the correct data type.
#' Reimplementation of backendInitialize with a `data` parameter and
#' additional input validation
setMethod(
"backendInitialize", "ChromBackendTest",
function(object, chromVars, rtime, intensity, data) {
## Extract relevant information from a parameter `data` if provided
if (!missing(data)) {
chromVars <- as.data.frame(
data[, !colnames(data) %in% c("rtime", "intensity")])
if (any(colnames(data) == "rtime"))
rtime <- data$rtime
if (any(colnames(data) == "intensity"))
intensity <- data$intensity
}
## Check that provided variables have the correct data type
validChromData(chromVars)
n <- nrow(chromVars)
## Validate rtime and intensity
if (missing(rtime))
rtime <- vector("list", n)
if (missing(intensity))
intensity <- vector("list", n)
if (length(rtime) != length(intensity) || length(rtime) != n)
stop("lengths of 'rtime' and 'intensity' need to match the ",
"number of chromatograms (i.e., nrow of 'chromVars'")
if (any(lengths(rtime) != lengths(intensity)))
stop("the number of data values in 'rtime' and 'intensity' have ",
"to match")
if (!all(vapply(rtime, is.numeric, logical(1))))
stop("'rtime' has to be a list of numeric values")
if (!all(vapply(intensity, is.numeric, logical(1))))
stop("'intensity' has to be a list of numeric values")
## If rtime or itensity is of type NumericList convert to list
if (inherits(rtime, "NumericList"))
rtime <- as.list(rtime)
if (inherits(intensity, "NumericList"))
intensity <- as.list(intensity)
## Setting dataStorage and dataOrigin
chromVars$dataStorage <- rep("<memory>", n)
if (is.null(chromVars$dataOrigin))
chromVars$dataOrigin <- rep("<user provided>", n)
## Fill object with data
object@chromVars <- as.data.frame(chromVars)
object@rtime <- rtime
object@intensity <- intensity
validObject(object)
object
})
This extended backendInitialize()
implementation would
now also assure data validity and integrity. Below we use this function
again to create our backend instance.
#' Create and initialize the backend
be <- backendInitialize(ChromBackendTest(),
chromVars = cvars, rtime = rts,
intensity = ints)
be
## ChromBackendTest with 3 chromatograms
The backendInitialize()
method that we implemented for
our backend class expects the user to provide the full MS data. It would
alternatively also be possible to implement a method that takes data
file names as input from which the function can then import the data.
The purpose of the backendInitialize()
method is to
initialize and prepare the data in a way that it can be
accessed by a Chromatograms
object. Whether the data is
actually loaded into memory or simply referenced and loaded upon request
does not matter as long as the backend is able to provide the data
though its accessor methods when requested by the
Chromatograms
object.
chromVariables()
The chromVariables()
method should return a
character
vector with the names of all available
chromatogram variables of the backend. While a backend class should
support defining and providing their own variables, each
ChromBackend
class must provide also the
core chromatogram variables (in the correct data type). These
can be listed by the coreChromVariables()
function:
#' List core chromatogram variables along with data types.
coreChromVariables()
## chromIndex collisionEnergy dataOrigin dataStorage intensity
## "integer" "numeric" "character" "character" "NumericList"
## msLevel mz mzMin mzMax precursorMz
## "integer" "numeric" "numeric" "numeric" "numeric"
## precursorMzMin precursorMzMax productMz productMzMin productMzMax
## "numeric" "numeric" "numeric" "numeric" "numeric"
## rtime
## "NumericList"
A typical chromVariables()
method for a
ChromBackend
class will thus be implemented similarly to
the one for our ChromBackendTest
test backend: it will
return the union of the core chromatogram variables and the names for
all available spectra variables within the backend object.
#' Accessor for available chromatogram variables
setMethod("chromVariables", "ChromBackendTest", function(object) {
union(names(coreChromVariables()), colnames(object@chromVars))
})
chromVariables(be)
## [1] "chromIndex" "collisionEnergy" "dataOrigin" "dataStorage"
## [5] "intensity" "msLevel" "mz" "mzMin"
## [9] "mzMax" "precursorMz" "precursorMzMin" "precursorMzMax"
## [13] "productMz" "productMzMin" "productMzMax" "rtime"
chromData()
The chromData
method should return the
full chromatogram data within a backend as a
DataFrame
object (defined in the S4Vectors
package). A parameter columns
should allow to define the
names of the variables that should be returned. Each row in this data
frame should represent one chromatogram, each column a chromatogram
variable. Columns "rtime"
and "intensity"
(if
requested) have to contain each a NumericList
with the
retention time and intensity values of the chromatograms. The
DataFrame
must provide values (even if
they are NA
) for all requested spectra
variables of the backend (including the core
chromatogram variables). The fillCoreChromVariables()
function from the Chromatograms package allows to
complete (fill) a provided data.frame
with
eventually missing core chromatogram variables (columns):
#' Get the data.frame with the available chrom variables
be@chromVars
## msLevel mz dataStorage dataOrigin
## 1 1 112.2 <memory> <user provided>
## 2 1 123.3 <memory> <user provided>
## 3 1 134.4 <memory> <user provided>
#' Complete this data.frame with missing core variables
fillCoreChromVariables(be@chromVars)
## msLevel mz dataStorage dataOrigin chromIndex collisionEnergy
## 1 1 112.2 <memory> <user provided> NA NA
## 2 1 123.3 <memory> <user provided> NA NA
## 3 1 134.4 <memory> <user provided> NA NA
## dataOrigin dataStorage msLevel mz mzMin mzMax precursorMz precursorMzMin
## 1 <NA> <NA> NA NA NA NA NA NA
## 2 <NA> <NA> NA NA NA NA NA NA
## 3 <NA> <NA> NA NA NA NA NA NA
## precursorMzMax productMz productMzMin productMzMax
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
We can thus use this function to add eventually missing core
chromatogram variables in the chromData
implementation for
our backend:
#' function to extract the full chrom data; we would need to import the
#' `DataFrame()` function from the S4Vectors package and the `NumericList`
#' from the IRanges package.
setMethod(
"chromData", "ChromBackendTest",
function(object, columns = chromVariables(object)) {
if (!all(columns %in% chromVariables(object)))
stop("Some of the requested variables are not available")
res <- S4Vectors::DataFrame(object@chromVars)
## Add rtime and intensity values to the result; would need to
## import the `NumericList()` function from the IRanges package
res$rtime <- IRanges::NumericList(object@rtime, compress = FALSE)
res$intensity <- IRanges::NumericList(
object@intensity, compress = FALSE)
## Fill with eventually missing core variables
res <- fillCoreChromVariables(res)
res[, columns, drop = FALSE]
})
We can now use chromData()
to either extract the full
chromatogram data from the backend, or only the data for selected
variables.
#' Extract the full data
chromData(be)
## DataFrame with 3 rows and 16 columns
## chromIndex collisionEnergy dataOrigin dataStorage
## <integer> <numeric> <character> <character>
## 1 NA NA <user provided> <memory>
## 2 NA NA <user provided> <memory>
## 3 NA NA <user provided> <memory>
## intensity msLevel mz mzMin mzMax precursorMz
## <NumericList> <integer> <numeric> <numeric> <numeric> <numeric>
## 1 123.3, 153.6,2354.3,... 1 112.2 NA NA NA
## 2 100.0, 80.1 1 123.3 NA NA NA
## 3 12.3,135.2,100.0 1 134.4 NA NA NA
## precursorMzMin precursorMzMax productMz productMzMin productMzMax
## <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## rtime
## <NumericList>
## 1 12.4,12.8,13.2,...
## 2 45.1,46.2
## 3 64.4,64.8,65.2
## DataFrame with 3 rows and 3 columns
## rtime mz msLevel
## <NumericList> <numeric> <integer>
## 1 12.4,12.8,13.2,... 112.2 1
## 2 45.1,46.2 123.3 1
## 3 64.4,64.8,65.2 134.4 1
## DataFrame with 3 rows and 2 columns
## collisionEnergy mzMin
## <numeric> <numeric>
## 1 NA NA
## 2 NA NA
## 3 NA NA
peaksData()
The peaksData()
method extracts the chromatographic data
(peaks), i.e., the chromatograms’ retention time and intensity
values. This data is returned as a list
of arrays, with one
array per chromatogram with columns being the peaks variables
(retention time and intensity values) and rows the individual data
pairs. Each backend must provide retention times and intensity values
with this method, but additional peaks variables (columns) are also
supported.
Below we implement the peaksData()
method for our
backend. Due to the way we stored the retention time and intensity
values within our object we need to loop over the respective lists (in
@rtime
and intensity
) and combine the values
of each chromatogram to an array (matrix
). Since our
backend does not allow any additional other peaks variables we allow
columns
to be only c("rtime", "intensity")
,
and also only in that specific order.
#' method to extract the full chromatographic data as list of arrays
setMethod(
"peaksData", "ChromBackendTest",
function(object, columns = c("rtime", "intensity")) {
if (length(columns) != 2 && columns != c("rtime", "intensity"))
stop("'columns' supports only \"rtime\" and \"intensity\"")
mapply(rtime = object@rtime, intensity = object@intensity,
FUN = cbind, SIMPLIFY = FALSE, USE.NAMES = FALSE)
})
And with this method we can now extract the peaks data from our backend.
#' Extract the *peaks* data (i.e. intensity and retention times)
peaksData(be)
## [[1]]
## rtime intensity
## [1,] 12.4 123.3
## [2,] 12.8 153.6
## [3,] 13.2 2354.3
## [4,] 14.6 243.4
##
## [[2]]
## rtime intensity
## [1,] 45.1 100.0
## [2,] 46.2 80.1
##
## [[3]]
## rtime intensity
## [1,] 64.4 12.3
## [2,] 64.8 135.2
## [3,] 65.2 100.0
Since the peaksData()
method is the main function used
by a Chromatograms
to retrieve data from the backend (and
further process the values), this method should be implemented in an
efficient way. Due to the way we store the data within our example
backend we need to loop over the @rtime
and
@intensity
slots. A different implementation that stores
the peaks data already as a list
of arrays would be more
efficient for this operation (but eventually slower for some other
operations, such as extracting peaks variables separately with the
rtime()
or intensity()
functions.
[
The [
method allows to subset ChromBackend
objects. This operation is expected to reduce a
ChromBackend
object to the selected chromatograms without
changing values for the subset chromatograms. The method should support
to subset by indices or logical vectors and should also support
duplicating elements (i.e., when duplicated indices are used) as well as
to subset in arbitrary order. An error should be thrown if indices are
out of bounds, but the method should also support returning an empty
backend with [integer()]
. The
MsCoreUtils::i2index
function can be used to check and
convert the provided parameter i
(defining the subset) to
an integer vector.
Below we implement a possible [
for our test backend
class. We ignore the parameters j
from the definition of
the [
generic, since we treat our data to be
one-dimensional (with each chromatogram being one element).
#' Main subset method.
setMethod("[", "ChromBackendTest", function(x, i, j, ..., drop = FALSE) {
i <- MsCoreUtils::i2index(i, length = length(x))
x@chromVars <- x@chromVars[i, ]
x@rtime <- x@rtime[i]
x@intensity <- x@intensity[i]
x
})
We can now subset our backend to the last two chromatograms.
a <- be[2:3]
chromData(a)
## DataFrame with 2 rows and 16 columns
## chromIndex collisionEnergy dataOrigin dataStorage intensity
## <integer> <numeric> <character> <character> <NumericList>
## 1 NA NA <user provided> <memory> 100.0, 80.1
## 2 NA NA <user provided> <memory> 12.3,135.2,100.0
## msLevel mz mzMin mzMax precursorMz precursorMzMin
## <integer> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 1 123.3 NA NA NA NA
## 2 1 134.4 NA NA NA NA
## precursorMzMax productMz productMzMin productMzMax rtime
## <numeric> <numeric> <numeric> <numeric> <NumericList>
## 1 NA NA NA NA 45.1,46.2
## 2 NA NA NA NA 64.4,64.8,65.2
Or extracting the second chromatogram multiple times.
## DataFrame with 3 rows and 16 columns
## chromIndex collisionEnergy dataOrigin dataStorage intensity
## <integer> <numeric> <character> <character> <NumericList>
## 2 NA NA <user provided> <memory> 100.0, 80.1
## 2.1 NA NA <user provided> <memory> 100.0, 80.1
## 2.2 NA NA <user provided> <memory> 100.0, 80.1
## msLevel mz mzMin mzMax precursorMz precursorMzMin
## <integer> <numeric> <numeric> <numeric> <numeric> <numeric>
## 2 1 123.3 NA NA NA NA
## 2.1 1 123.3 NA NA NA NA
## 2.2 1 123.3 NA NA NA NA
## precursorMzMax productMz productMzMin productMzMax rtime
## <numeric> <numeric> <numeric> <numeric> <NumericList>
## 2 NA NA NA NA 45.1,46.2
## 2.1 NA NA NA NA 45.1,46.2
## 2.2 NA NA NA NA 45.1,46.2
$
The $
method is expected to extract a single
chromatogram variable from a backend. Parameter name
should
allow to name the chromatogram variable to return. Each
ChromBackend
must support extracting the
core chromatogram variables with this method (even if no data might be
available for that variable). In our example implementation below we
make use of the chromData()
method, but more efficient
implementations might be possible as well (that would not require to
first subset/create a DataFrame
with the full data and to
then subset that again to an individual column). Also, the
$
method should check if the requested spectra variable is
available and should throw an error otherwise.
#' Access a single chromatogram variable
setMethod("$", "ChromBackendTest", function(x, name) {
chromData(x, columns = name)[, 1L]
})
With this we can now extract the MS levels
be$msLevel
## [1] 1 1 1
or a core spectra variable without values in our example backend.
be$precursorMz
## [1] NA NA NA
or also the intensity values
be$intensity
## NumericList of length 3
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 135.2 100
backendMerge()
The backendMerge()
method merges (combines)
ChromBackend
objects (of the same type!) into a single
instance. For our test backend we thus need to combine the values in the
@chromVars
, @rtime
and @intensity
slots. To support also merging of data.frame
s with
different sets of columns we use the MsCoreUtils::rbindFill
function instead of a simple rbind
(this function joins
data frames making an union of all available columns filling eventually
missing columns with NA
).
#' Method allowing to join (concatenate) backends
setMethod("backendMerge", "ChromBackendTest", function(object, ...) {
res <- object
object <- unname(c(list(object), list(...)))
res@rtime <- do.call(c, lapply(object, function(z) z@rtime))
res@intensity <- do.call(c, lapply(object, function(z) z@intensity))
res@chromVars <- do.call(MsCoreUtils::rbindFill,
lapply(object, function(z) z@chromVars))
validObject(res)
res
})
Testing the function by merging the example backend instance with itself.
a <- backendMerge(be, be[2], be)
a
## ChromBackendTest with 7 chromatograms
As stated in the general description, ChromBackend
implementations can also be purely read-only resources allowing
to just access, but not to replace data. For these backends
isReadOnly()
should return FALSE
. Data
replacement methods listed in this section would not need to be
implemented. Our example backend stores the full data in memory, within
the object, and hence we can easily change and replace values.
Since we support replacing values we also implement the
isReadOnly()
method for our example implementation to
return FALSE
(instead of the default
TRUE
).
#' Default for backends:
isReadOnly(be)
## [1] TRUE
#' Implementation of isReadOnly for ChromBackendTest
setMethod("isReadOnly", "ChromBackendTest", function(object) FALSE)
isReadOnly(be)
## [1] FALSE
All data replacement function are expected to return an instance of the same backend class that was used as input.
chromData<-
The main replacement method is chromData<-
which
should allow to replace the content of a backend with new data. This
data is expected to be provided as a DataFrame
(similar to
the one returned by chromData()
). Also the method is
expected to replace the full data within the backend,
i.e., all chromatogram and peaks variables. While values can be
replaced, the number of chromatograms before and after a call to
chromData<-
has to be the same. For our example
implementation of chromData<-
we can re-use the
backendInitialize()
method defined before, with the
data
parameter.
#' Replacement method for the full chromatogram data
setReplaceMethod("chromData", "ChromBackendTest", function(object, value) {
if (!inherits(value, "DataFrame"))
stop("'value' is expected to be a 'DataFrame'")
if (length(object) && length(object) != nrow(value))
stop("'value' has to be a 'DataFrame' with ", length(object), " rows")
object <- backendInitialize(ChromBackendTest(), data = value)
object
})
To test this new method we extract the full chromatogram data from
our example data set, add an additional column (chromatogram variable)
and use chromData<-
to replace the data of the
backend.
Check that we have now also the new column available.
be$new_col
## [1] "a" "b" "c"
$<-
The $<-
method should allow to replace values for an
existing chromatogram variable or to add an additional variable to the
backend. As with all replacement methods, the length
of
value
has to match the number of chromatograms represented
by the backend. For replacement of retention time or intensity values we
need also to ensure that the data would be correct after the operation,
i.e., that the number of retention time and intensity values per
chromatogram are the identical and that all retention time and intensity
values are numeric. Finally, we use the validChromData()
function to ensure that, after replacement, all core chromatogram
variables have the correct data type.
#' Replace or add a single chromatogram variable.
setReplaceMethod("$", "ChromBackendTest", function(x, name, value) {
if (length(value) != length(be))
stop("length of 'value' needs to match the number of chromatograms ",
"in object.")
if (name %in% c("rtime", "intensity")) {
## In case retention time or intensity values are provided as
## NumericList convert to a list.
if (is(value, "NumericList"))
value <- as.list(value)
## Ensure number of retention time and intensity values match
if (!all(lengths(value) == lengths(x@intensity)))
stop("Number of retention time values needs to match number of ",
"intensity values.")
## Ensure all values are numeric
if (!all(vapply(value, is.numeric, logical(1))))
stop("For replacement of retention time or intensity values, ",
"'value' is expected to be a list of numeric vectors.")
if (name == "rtime")
x@rtime <- value
if (name == "intensity")
x@intensity <- value
} else
x@chromVars[[name]] <- value
## Check that data types are correct after replacement
validChromData(x@chromVars)
x
})
We can thus replace an existing chromatogram variable, such as
msLevel
:
#' Values before replacement
be$msLevel
## [1] 1 1 1
#' Replace MS levels
be$msLevel <- c(3L, 2L, 1L)
#' Values after replacement
be$msLevel
## [1] 3 2 1
We can also add a new chromatogram variables:
#' Add a new chromatogram variable
be$name <- c("A", "B", "C")
be$name
## [1] "A" "B" "C"
Or also replace intensity values. Below we replace the intensity values by adding a value of +3 to each.
#' Replace intensity values
be$intensity <- be$intensity + 3
be$intensity
## NumericList of length 3
## [[1]] 126.3 156.6 2357.3 246.4
## [[2]] 103 83.1
## [[3]] 15.3 138.2 103
selectChromVariables()
The selectChromVariables()
function should subset the
content of a backend to the selected chromatogram variables, that can be
specified with parameter chromVariables
. As a result the
input backend should be returned, but reduced to the selected
chromatogram variables. This function thus adds a subset operation that
reduces the data in a backend by columns, dropping all
chromatogram variables other than the ones specified with the
chromVariables
parameter. In the implementation we need to
give special care to variables "rtime"
and
"intensity"
. If both are about to be removed we need to
initialize the @rtime
and @intensity
slots
with empty lists matching the number of chromatograms in our backend. If
only "intensity"
values are to be removed we replace them
with NA_real_
while removing only "rtime"
is
not supported (also because retention time values of NA
are
not allowed).
#' Method to *subset* a backend by chromatogram variables (columns)
setMethod(
"selectChromVariables", "ChromBackendTest",
function(object, chromVariables = chromVariables(object)) {
keep <- colnames(object@chromVars) %in% chromVariables
object@chromVars <- object@chromVars[, keep, drop = FALSE]
## If neither "rtime" and "intensity" is in chromVariables: initialize
## with empty vectors.
if (!any(c("rtime", "intensity") %in% chromVariables)) {
object@rtime <- vector("list", length(object))
object@intensity <- vector("list", length(object))
} else {
## intensity not in chromVariables: replace intensity values with NA
if (!"intensity" %in% chromVariables)
object@intensity <- lapply(object@intensity,
function(z) rep(NA_real_, length(z)))
## removal of only rtime is not supported
if (!"rtime" %in% chromVariables)
stop("Exclusive removal of retention times is not supported. ",
"Retention times can only be removed if also intensity ",
"values are removed.")
}
validObject(object)
object
})
We can now restrict the data set to only selected chrom variables:
#' keep only dataStorage and msLevel
be_2 <- selectChromVariables(be, c("dataStorage", "msLevel"))
chromData(be_2)
## DataFrame with 3 rows and 16 columns
## chromIndex collisionEnergy dataOrigin dataStorage intensity msLevel
## <integer> <numeric> <character> <character> <NumericList> <integer>
## 1 NA NA NA <memory> 3
## 2 NA NA NA <memory> 2
## 3 NA NA NA <memory> 1
## mz mzMin mzMax precursorMz precursorMzMin precursorMzMax
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 NA NA NA NA NA NA
## 2 NA NA NA NA NA NA
## 3 NA NA NA NA NA NA
## productMz productMzMin productMzMax rtime
## <numeric> <numeric> <numeric> <NumericList>
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
Replacing/removing intensity values would be possible:
#' Keep dataStorage, msLevel, mz and rtime
be_2 <- selectChromVariables(be, c("dataStorage", "msLevel", "mz", "rtime"))
chromData(be_2)
## DataFrame with 3 rows and 16 columns
## chromIndex collisionEnergy dataOrigin dataStorage intensity msLevel
## <integer> <numeric> <character> <character> <NumericList> <integer>
## 1 NA NA NA <memory> NA,NA,NA,... 3
## 2 NA NA NA <memory> NA,NA 2
## 3 NA NA NA <memory> NA,NA,NA 1
## mz mzMin mzMax precursorMz precursorMzMin precursorMzMax
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 112.2 NA NA NA NA NA
## 2 123.3 NA NA NA NA NA
## 3 134.4 NA NA NA NA NA
## productMz productMzMin productMzMax rtime
## <numeric> <numeric> <numeric> <NumericList>
## 1 NA NA NA 12.4,12.8,13.2,...
## 2 NA NA NA 45.1,46.2
## 3 NA NA NA 64.4,64.8,65.2
All intensity values are thus NA. Removing only intensity values would (should) throw an error.
peaksData<-
The peaksData<-
method should allow to replace the
full peaks data (retention time and intensity value pairs) of all
chromatograms in a backend. As value
a list
of
arrays (e.g. two column numeric
matrices) should be
provided with columns names "rtime"
and
"intensity"
. Because the full peaks data is provided at
once, this method can (and should) support changing also the number of
peaks per chromatogram (while the methods like rtime<-
or $rtime
would not allow). In our implementation we need
to ensure that a) the provided list
is of length equal to
the number of chromatograms and b) each element is a
numeric
matrix with "rtime"
and
"intensity"
columns from which we can extract the
values.
#' replacement method for peaks data
setReplaceMethod("peaksData", "ChromBackendTest", function(object, value) {
if (!(is.list(value) || inherits(value, "SimpleList")))
stop("'value' has to be a list-like object")
if (!length(value) == length(object))
stop("The length of the provided list has to match the number of ",
"chromatograms in 'object'")
## First loop to check also for validity of the matrices, i.e. each element
## has to be a `numeric` `matrix` with columns named "rtime" and "intensity"
object@rtime <- lapply(value, function(z) {
if (!is.matrix(z) || !is.numeric(z))
stop("'value' is expected to be a 'list' of numeric matrices")
if (!all(c("rtime", "intensity") %in% colnames(z)))
stop("All matrices in 'value' need to have columns named ",
"\"rtime\" and \"intensity\"")
z[, "rtime"]
})
object@intensity <- lapply(value, "[", , "intensity")
validObject(object)
object
})
With this method we can now replace the peaks data of a backend:
#' Create a list with peaks matrices; our backend has 3 chromatograms
#' thus our `list` has to be of length 3
tmp <- list(
cbind(rtime = c(12.3, 14.4, 15.4, 16.4),
intensity = c(200, 312, 354.1, 232)),
cbind(rtime = c(14.4),
intensity = c(13.4)),
cbind(rtime = c(223.2, 223.8, 234.1, 234.5, 234.9),
intensity = c(12.3, 45.3, 65.3, 51.1, 29.3))
)
#' Assign this peaks data to one of our test backends
peaksData(be_2) <- tmp
#' Evaluate that we properly added the peaks data
peaksData(be_2)
## [[1]]
## rtime intensity
## [1,] 12.3 200.0
## [2,] 14.4 312.0
## [3,] 15.4 354.1
## [4,] 16.4 232.0
##
## [[2]]
## rtime intensity
## rtime 14.4 13.4
##
## [[3]]
## rtime intensity
## [1,] 223.2 12.3
## [2,] 223.8 45.3
## [3,] 234.1 65.3
## [4,] 234.5 51.1
## [5,] 234.9 29.3
Default implementations for the ChromBackend
class are
available for a large number of methods. Thus, any backend extending
this class will automatically inherit these default implementations.
Alternative, class-specific, versions can, but don’t need to be
developed. The default versions are defined in the
R/ChromBackend.R file, and also listed in this section. If
alternative versions are implemented it should be ensured that the
expected data type is always used for core chromatogram variables. Use
coreChromVariables()
to list these mandatory data
types.
backendParallelFactor()
The backendParallelFactor()
function allows a backend to
suggest a preferred way it could be split for parallel processing. The
default implementation returns factor()
(i.e. a
factor
of length 0) hence not suggesting any specific
splitting setup.
#' Is there a specific way how the object could be best split for
#' parallel processing?
setMethod("backendParallelFactor", "ChromBackend", function(object, ...) {
factor()
})
## factor()
## Levels:
chromVariables()
The chromVariables()
function is expected to return the
names of all available chromatogram variables (which should include the
core chromatogram variables). The default implementation
is:
#' get the available chromatogram variables.
setMethod("chromVariables", "ChromBackend", function(object) {
colnames(chromData(object))
})
The result from calling the default implementation on our test backend:
chromVariables(be)
## [1] "chromIndex" "collisionEnergy" "dataOrigin" "dataStorage"
## [5] "intensity" "msLevel" "mz" "mzMin"
## [9] "mzMax" "precursorMz" "precursorMzMin" "precursorMzMax"
## [13] "productMz" "productMzMin" "productMzMax" "rtime"
## [17] "new_col" "name"
chromIndex()
The chromIndex()
function should return the value for
the "chromIndex"
chromatogram variable. As a result, an
integer
of length equal to the number of chromatograms in
object
needs to be returned. The default implementation
is:
#' get the values for the chromIndex chromatogram variable
setMethod("chromIndex", "ChromBackend",
function(object, columns = chromVariables(object)) {
chromData(object, columns = "chromIndex")[, 1L]
})
The result of calling this method on our test backend:
chromIndex(be)
## [1] NA NA NA
collisionEnergy()
The collisionEnergy()
function should return the value
for the "collisionEnergy"
chromatogram variable. As a
result, a numeric
of length equal to the number of
chromatograms has to be returned. The default implementation is:
#' get the values for the collisionEnergy chromatogram variable
setMethod("collisionEnergy", "ChromBackend", function(object) {
chromData(object, columns = "collisionEnergy")[, 1L]
})
The result of calling this method on our test backend:
collisionEnergy(be)
## [1] NA NA NA
The default replacement method for the collisionEnergy
chromatogram variable is:
#' Default replacement method for collisionEnergy
setReplaceMethod(
"collisionEnergy", "ChromBackend", function(object, value) {
object$collisionEnergy <- value
object
})
This method thus makes use of the $<-
replacement
method we implemented above. To test this function we replace the
collision energy below.
#' Replace the collision energy
collisionEnergy(be) <- c(20, 30, 20)
collisionEnergy(be)
## [1] 20 30 20
dataOrigin()
, dataOrigin<-
The dataOrigin()
and dataOrigin<-
methods return or set the value(s) for the "dataOrigin"
chromatogram variable. The values for this chromatogram variable need to
be of type character
(the length equal to the number of
chromatograms). The default implementation for dataOrigin()
is:
#' Default implementation to access dataOrigin
setMethod("dataOrigin", "ChromBackend", function(object) {
chromData(object, columns = "dataOrigin")[, 1L]
})
Below we use this method to access the values of the
dataOrigin
chromatogram variable.
#' Access the dataOrigin values
dataOrigin(be)
## [1] "<user provided>" "<user provided>" "<user provided>"
The default implementation for dataOrigin<-
uses,
like all defaults for replacement methods, the $<-
method:
#' Default implementation of the `dataOrigin<-` replacement method
setReplaceMethod("dataOrigin", "ChromBackend", function(object, value) {
object$dataOrigin <- value
object
})
For our backend we can change the values of the
dataOrigin
variable:
#' Replace the backend's dataOrigin values
dataOrigin(be) <- rep("from somewhere", 3)
dataOrigin(be)
## [1] "from somewhere" "from somewhere" "from somewhere"
dataStorage()
, dataStorage<-
Similarly, the dataStorage()
and
dataStorage<-
methods should allow to get or set the
data storage chromatogram variable. Values of the
dataStorage
chromatogram variable are expected to be of
type character
and for each chromatogram in a backend one
value needs to be defined (which can not be NA_character
).
The default implementation for dataStorage()
uses, like
most access methods, the chromData()
function:
#' Default implementation to access dataStorage
setMethod("dataStorage", "ChromBackend", function(object) {
chromData(object, columns = "dataStorage")[, 1L]
})
Below we use this method to access the values of the
dataStorage
chromatogram variable.
#' Access the dataStorage values
dataStorage(be)
## [1] "<memory>" "<memory>" "<memory>"
Note that this variable is supposed to provide information on the
location where the data is stored and hence for some type of backends it
might not be possible or advised to let the user change its values. For
such backends a dataStorage<-
replacement method should
be implemented specifically that throws an error if values are replaced
with eventually invalid values. The default implementation for this
method uses, like all defaults for replacement methods, the
$<-
method:
#' Default implementation of the `dataStorage<-` replacement method
setReplaceMethod("dataStorage", "ChromBackend", function(object, value) {
object$dataStorage <- value
object
})
For our backend we can change the values of the
dataStorage
variable:
#' Replace the backend's datastorage values
dataStorage(be) <- c("here", "here", "here")
dataStorage(be)
## [1] "here" "here" "here"
intensity()
, intensity<-
The intensity()
and intensity<-
methods
allow to extract or set the intensity values of the individual
chromatograms represented by the backend. The default for the
intensity()
function, which is expected to return a
list
of numeric
values with the intensity
values of each chromatogram, uses also the chromData()
method:
#' Default method to extract intensity values
setMethod("intensity", "ChromBackend", function(object) {
chromData(object, columns = "intensity")[, 1L]
})
Based on the way our example backend implementation stores the data,
accessing the intensity values in this way would not be very efficient.
It would be much faster to directly return the content of the
@intensity
slot, converting that into the expected
NumericList
. Thus we implement below a more efficient
version of the method specifically for our backend:
#' Alternative implementation for our backend
setMethod("intensity", "ChromBackendTest", function(object) {
IRanges::NumericList(object@intensity, compress = FALSE)
})
intensity(be)
## NumericList of length 3
## [[1]] 126.3 156.6 2357.3 246.4
## [[2]] 103 83.1
## [[3]] 15.3 138.2 103
The default replacement method for intensity values uses the
$<-
method:
#' Default implementation of the replacement method for intensity values
setReplaceMethod("intensity", "ChromBackend", function(object, value) {
object$intensity <- value
object
})
Also here we could implement an alternative version that replaces
directly the content of the @intensity
slot. We implement
such a replacement method further below for the rtime<-
method. Here we simply use the default implementation to replace the
intensity values with original values divided by 10.
## NumericList of length 3
## [[1]] 12.63 15.66 235.73 24.64
## [[2]] 10.3 8.31
## [[3]] 1.53 13.82 10.3
isEmpty()
The isEmpty()
is a simple helper function to evaluate
whether chromatograms are empty, i.e. have no peaks (retention
time and intensity values). It should return a logical vector of length
equal to the number of chromatograms in the backend with
TRUE
if a chromatogram is empty and FALSE
otherwise. The default implementation uses the lengths()
method (defined further below) that returns for each chromatogram the
number of available data points (peaks).
#' Default implementation for `isEmpty()`
setMethod("isEmpty", "ChromBackend", function(x) {
lengths(x) == 0L
})
isEmpty(be)
## [1] FALSE FALSE FALSE
isReadOnly()
As discussed above, backends can also be read-only, hence
only allowing to access, but not to change any values (e.g. if the data
is stored in a data base and the connection to this data base does not
support updating or replacing data). In such cases, the default
isReadOnly()
method can be used, which returns always
TRUE
:
#' Default implementation of `isReadOnly()`
setMethod("isReadOnly", "ChromBackend", function(object) {
TRUE
})
Backends that support changing data values should implement their own
version (like we did above) to return FALSE
instead:
isReadOnly(be)
## [1] FALSE
length()
The length()
method should return a single
integer
with the total number of chromatograms available
through the backend. The default implementation for this function
is:
#' Default implementation for `length()`
setMethod("length", "ChromBackend", function(x) {
nrow(chromData(x, columns = "dataStorage"))
})
length(be)
## [1] 3
lengths()
The lengths()
function should return the number of data
pairs (peaks; retention time or intensity values) per chromatogram. The
result should be an integer
vector (of length equal to the
number of chromatograms in the backend) with these counts. The default
implementation uses the intensity()
function.
#' Default implementation for `lengths()`
setMethod("lengths", "ChromBackend", function(x) {
lengths(intensity(x))
})
The number of peaks for our test backend:
lengths(be)
## [1] 4 2 3
msLevel()
, msLevel<-
The msLevel()
and msLevel<-
methods
should allow extracting and setting the MS level for the individual
chromatograms. MS levels are encoded as integer
, thus,
msLevel()
must return an integer
vector of
length equal to the number of chromatograms of the backend and
msLevel<-
should take/accept such a vector as input. The
default implementations for both methods are shown below.
#' Default methods to get or set MS levels
setMethod("msLevel", "ChromBackend", function(object) {
chromData(object, columns = "msLevel")[, 1L]
})
setReplaceMethod("msLevel", "ChromBackend", function(object, value) {
object$msLevel <- value
object
})
To test these we below replace the MS levels for our test data set and extract these values again.
## [1] 1 2 4
mz()
, mz<-
The mz()
and mz<-
methods should allow
to extract or set the m/z value for each chromatogram. The m/z value of
a chromatogram is encoded as numeric
, thus, the methods are
expected to return or accept a numeric
vector of length
equal to the number of chromatograms. The default implementations are
shown below.
#' Default implementations to get or set m/z value(s)
setMethod("mz", "ChromBackend", function(object) {
chromData(object, columns = "mz")[, 1L]
})
setReplaceMethod("mz", "ChromBackend", function(object, value) {
object$mz <- value
object
})
We below set and extract these target m/z values.
## [1] 314.3 312.5 542.1
mzMax()
, mzMax<-
The mzMax()
and mzMax<-
methods should
allow to extract or set the upper m/z boundary for each chromatogram.
m/z values are encoded as numeric
, thus, the methods are
expected to return or accept a numeric
vector of length
equal to the number of chromatograms. The default implementations are
shown below.
#' Default implementations to get or set upper m/z limits
setMethod("mzMax", "ChromBackend", function(object) {
chromData(object, columns = "mzMax")[, 1L]
})
setReplaceMethod("mzMax", "ChromBackend", function(object, value) {
object$mzMax <- value
object
})
Testing these functions by replacing the upper m/z boundary with new values.
## [1] 314.31 312.51 542.11
mzMin(),
mzMin<-`
The mzMin()
and mzMin<-
methods should
allow to extract or set the lower m/z boundary for each chromatogram.
m/z values are encoded as numeric
, thus, the methods are
expected to return or accept a numeric
vector of length
equal to the number of chromatograms. The default implementations are
shown below.
#' Default methods to get or set the lower m/z boundary
setMethod("mzMin", "ChromBackend", function(object) {
chromData(object, columns = "mzMin")[, 1L]
})
setReplaceMethod("mzMin", "ChromBackend", function(object, value) {
object$mzMin <- value
object
})
Testing these functions by replacing the lower m/z boundary with new values.
## [1] 314.29 312.49 542.09
peaksVariables()
The peaksVariables()
function is supposed to provide the
names of the available peaks variables. Backends
must provide retention time and intensity values, thus,
the default implementation simply returns
c("rtime", "intensity")
. If additional peaks variables
would be available, these could also be listed by the
peaksVariables()
method.
#' Default implementation for peaksVariables()
setMethod(
"peaksVariables", "ChromBackend", function(object) {
c("rtime", "intensity")
})
peaksVariables(be)
## [1] "rtime" "intensity"
precursorMz()
, precursorMz<-
The precursorMz()
and precursorMz<-
methods are expected to get or set the values for the precursor m/z of
each chromatogram (if available). These are encoded as
numeric
(one value per chromatogram) - and if a value is
not available NA_real_
should be returned. The default
implementations are:
#' Default implementations to get or set the precursorMz chrom variable
setMethod("precursorMz", "ChromBackend", function(object) {
chromData(object, columns = "precursorMz")[, 1L]
})
setReplaceMethod("precursorMz", "ChromBackend", function(object, value) {
object$precursorMz <- value
object
})
Below we set and get the precursorMz
chromatogram
variable for our backend.
precursorMz(be) <- c(NA_real_, 123.3, 314.2)
precursorMz(be)
## [1] NA 123.3 314.2
precursorMzMax()
, precursorMzMax<-
These methods are supposed to allow to get and set the
precursorMzMax
chromatogram variable. The default
implementations are:
#' Default implementations for `precursorMzMax`
setMethod("precursorMzMax", "ChromBackend", function(object) {
chromData(object, columns = "precursorMzMax")[, 1L]
})
setReplaceMethod("precursorMzMax", "ChromBackend", function(object, value) {
object$precursorMzMax <- value
object
})
Below we test these functions by setting and extracting the values for this chromatogram variable.
precursorMzMax(be) <- precursorMz(be) + 0.1
precursorMzMax(be)
## [1] NA 123.4 314.3
precursorMzMin()
, precursorMzMin<-
These methods are supposed to allow to get and set the
precursorMzMin
chromatogram variable. The default
implementations are:
#' Default implementations for `precursorMzMin`
setMethod("precursorMzMin", "ChromBackend", function(object) {
chromData(object, columns = "precursorMzMin")[, 1L]
})
setReplaceMethod("precursorMzMin", "ChromBackend", function(object, value) {
object$precursorMzMin <- value
object
})
Below we test these functions by setting and extracting the values for this chromatogram variable.
precursorMzMin(be) <- precursorMz(be) - 0.1
precursorMzMin(be)
## [1] NA 123.2 314.1
productMz()
, productMz<-
These methods are supposed to allow to get and set the
productMz
chromatogram variable. The default
implementations are:
#' Default implementations for `productMz`
setMethod("productMz", "ChromBackend", function(object) {
chromData(object, columns = "productMz")[, 1L]
})
setReplaceMethod("productMz", "ChromBackend", function(object, value) {
object$productMz <- value
object
})
Below we test these functions by setting and extracting the values for this chromatogram variable.
## [1] 123.2 NA NA
productMzMax()
, productMzMax<-
These methods are supposed to allow to get and set the
productMzMax
chromatogram variable. The default
implementations are:
#' Default implementations for `productMzMax`
setMethod("productMzMax", "ChromBackend", function(object) {
chromData(object, columns = "productMzMax")[, 1L]
})
setReplaceMethod("productMzMax", "ChromBackend", function(object, value) {
object$productMzMax <- value
object
})
Below we test these functions by setting and extracting the values for this chromatogram variable.
productMzMax(be) <- productMz(be) + 0.02
productMzMax(be)
## [1] 123.22 NA NA
productMzMin()
, productMzMin<-
These methods are supposed to allow to get and set the
productMzMin
chromatogram variable. The default
implementations are:
#' Default implementations for `productMzMin`
setMethod("productMzMin", "ChromBackend", function(object) {
chromData(object, columns = "productMzMin")[, 1L]
})
setReplaceMethod("productMzMin", "ChromBackend", function(object, value) {
object$productMzMin <- value
object
})
Below we test these functions by setting and extracting the values for this chromatogram variable.
productMzMin(be) <- productMz(be) - 0.2
productMzMin(be)
## [1] 123.2 NA NA
rtime()
, rtime<-
The rtime()
and rtime<-
methods allow to
get and set the retention times of the individual chromatograms of the
backend. Similar to the method for the intensity values described above
they should return or accept a NumericList
, each element
being a numeric
vector with the retention time values of
one chromatogram. The default implementations of these methods are shown
below.
#' Default methods for `rtime()` and `rtime<-`
setMethod("rtime", "ChromBackend", function(object) {
chromData(object, columns = "rtime")[, 1L]
})
setReplaceMethod("rtime", "ChromBackend", function(object, value) {
object$rtime <- value
object
})
Also these methods use the chromData()
function to
extract intensity values and the $<-
to replace them.
Due to the way the data is stored in our example backend implementation
this is not the best/most efficient way to get or set these values.
Instead, we could implement the rtime()
function similar to
intensity()
above. For rtime<-
we implement
below a version that takes a list
or
NumericList
as input and directly replaces the values of
the @rtime
slot. In this method we need also to ensure that
the provided data is in the correct format, that the number of values
per chromatogram matches the expected values and that no missing values
are provided (NA_real_
values are not supported for
retention time).
#' Implementation of `rtime<-` for our backend
setReplaceMethod("rtime", "ChromBackendTest", function(object, value) {
## Convert to a standard list
if (inherits(value, "NumericList"))
value <- as.list(value)
## Check that length is correct
if (!length(value) == length(object))
stop("Length of 'value' needs to match the number of ",
"chromatograms in 'object'.")
## Check that lengths are correct
if (!all(lengths(value) == lengths(object@intensity)))
stop("The number of retention time values per chromatogram need to ",
"match the numher of intensities for that chromatogram.")
## Check that all values are numeric and we don't have missing values
not_ok <- vapply(value, function(z)
anyNA(z) | !is.numeric(z), logical(1))
if (any(not_ok))
stop("'value' needs to be a list of numeric values without ",
"missing values")
object@rtime <- value
object
})
We below test this implementation replacing the retention times of our example backend by shifting all values by 2 seconds.
## NumericList of length 3
## [[1]] 14.4 14.8 15.2 16.6
## [[2]] 47.1 48.2
## [[3]] 66.4 66.8 67.2
split()
The split()
method should split the backend into a
list
of backends containing subsets of the original
backend. The default implementation uses the default implementation of
split()
from R and should work in most cases. This function
uses the [
method to subset/split the object.
#' Default method to split a backend
setMethod("split", "ChromBackend", function(x, f, drop = FALSE, ...) {
split.default(x, f, drop = drop, ...)
})
We below test this by splitting the backend into two subsets.
## $`1`
## ChromBackendTest with 2 chromatograms
##
## $`2`
## ChromBackendTest with 1 chromatograms
A set of filter methods is defined that all allow to subset the backend to a smaller set of chromatograms, i.e. these filter methods reduce the number of chromatograms of the backend. Defaults are available for all methods, but also here alternative versions might be implemented depending on the backend class.
filterDataOrigin()
The filterDataOrigin()
method allows to filter/subset
the backend keeping only chromatograms for which the
dataOrigin
chromatogram variable matches (exactly) the
value(s) provided with parameter dataOrigin
.
#' Default for `filterDataOrigin()`
setMethod("filterDataOrigin", "ChromBackend",
function(object, dataOrigin = character(), ...) {
if (length(dataOrigin)) {
object <- object[dataOrigin(object) %in% dataOrigin]
if (is.unsorted(dataOrigin))
object[order(match(dataOrigin(object), dataOrigin))]
else object
} else object
})
Like all filter functions, this function is expected to always return an instance of the backend class, even if no element matches the provided values:
filterDataOrigin(be, "disk")
## ChromBackendTest with 0 chromatograms
filterDataStorage()
The filterDataStorage()
method allows to subset a
backend keeping only chromatograms for which values of their
dataStorage
chromatogram variable match the value(s)
provided with parameter dataStorage
. The default
implementation is shown below.
#' Default implementation for `filterDataStorage()`
setMethod("filterDataStorage", "ChromBackend",
function(object, dataStorage = character()) {
if (length(dataStorage)) {
object <- object[dataStorage(object) %in% dataStorage]
if (is.unsorted(dataStorage))
object[order(match(dataStorage(object), dataStorage))]
else object
} else object
})
filterMsLevel()
The filterMsLevel()
method allows to subset a backend to
chromatograms with their MS level matching the provided MS levels. The
default implementation is shown below.
filterMzRange()
The filterMzRange()
method allows to subset a backend to
chromatograms with their value of the mz
chromatogram being
within the provided m/z value range. Parameter mz
is
expected to be a numeric
of length 2 defining the lower and
upper boundary of the m/z range. The default implementation is shown
below:
filterMzValues()
The filterMzValues()
method allows to subset a backend
to chromatograms with their value of the mz
chromatogram
variable being equal to (one) of the provided m/z values, given an
acceptable difference defined by parameters ppm
and
tolerance
.
#' Default for `filterMzValues()`
setMethod("filterMzValues", "ChromBackend",
function(object, mz = numeric(), ppm = 20, tolerance = 0, ...) {
if (length(mz)) {
object[.values_match_mz(precursorMz(object), mz = mz,
ppm = ppm, tolerance = tolerance)]
} else object
})
## R Under development (unstable) (2024-03-24 r86185)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Chromatograms_0.1.0 ProtGenerics_1.35.4 BiocStyle_2.31.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.8 compiler_4.4.0 BiocManager_1.30.22
## [4] cluster_2.1.6 jquerylib_0.1.4 systemfonts_1.0.6
## [7] IRanges_2.37.1 textshaping_0.3.7 yaml_2.3.8
## [10] fastmap_1.1.1 R6_2.5.1 knitr_1.45
## [13] BiocGenerics_0.49.1 htmlwidgets_1.6.4 MASS_7.3-60.2
## [16] bookdown_0.38 desc_1.4.3 bslib_0.6.2
## [19] rlang_1.1.3 cachem_1.0.8 xfun_0.43
## [22] fs_1.6.3 MsCoreUtils_1.15.5 sass_0.4.9
## [25] memoise_2.0.1 cli_3.6.2 pkgdown_2.0.7.9000
## [28] magrittr_2.0.3 digest_0.6.35 lifecycle_1.0.4
## [31] clue_0.3-65 S4Vectors_0.41.5 vctrs_0.6.5
## [34] evaluate_0.23 ragg_1.3.0 stats4_4.4.0
## [37] rmarkdown_2.26 purrr_1.0.2 tools_4.4.0
## [40] htmltools_0.5.8