Package: Chromatograms
Authors: Laurent Gatto [aut] (https://orcid.org/0000-0002-1520-2268), Johannes Rainer [aut, cre] (https://orcid.org/0000-0002-6977-7147), Philippine Louail [aut] (https://orcid.org/0009-0007-5429-6846)
Compiled: Wed Mar 27 08:08:52 2024

Introduction

Similar to the Spectra package, the Chromatograms also separates the user-faced functionality to process and analyze chromatographic mass spectrometry (MS) data from the code for storage and representation of the data. The latter functionality is provided by implementations of the ChromBackend class, further on called backends. This vignette describes the ChromBackend class and illustrates on a simple example how a backend extending this class could be implemented.

Contributions to this vignette (content or correction of typos) or requests for additional details and information are highly welcome, ideally via pull requests or issues on the package’s github repository.

What is a ChromBackend?

The purpose of a backend class extending the virtual ChromBackend is to provide the chromatographic MS data to the Chromatograms object, which is used by the user to interact with - and analyze the data. The ChromBackend defines the API that new backends need to provide so that they can be used with Chromatograms. This API defines a set of methods to access the data. For many functions default implementations exist and a dedicated implementation for a new backend is only needed if necessary (e.g. if the data is stored in a way that a different access to it would be better). In addition, a core set of variables (data fields), the so called core chromatogram variables, is defined to describe the chromatographic data. Each backend needs to provide these, but can in also define additional data fields. Before implementing a new backend it is highly suggested to carefully read the following Conventions and definitions section.

Conventions and definitions

General conventions for chromatographic MS data of a Chromatograms are:

  • One Chromatograms object is designed to contain multiple chromatographic data (not data from a single chromatogram).
  • retention time values within each chromatogram are expected to be sorted increasingly.
  • Missing values (NA) for retention time values are not supported.
  • Properties (data fields) of a spectrum are called chromatogram variables. While backends can define their own properties, a minimum required set of chromatogram variables must be provided by each backend (even if their values are empty). These core chromatogram variables are listed (along with their expected data type) by the coreChromVariables() function.
  • dataStorage and dataOrigin are two special variables that define for each chromatogram where the data is (currently) stored and from where the data derived, respectively. Both are expected to be of typecharacter. Missing values for dataStorage are not allowed.
  • ChromBackend implementations can also represent purely read-only data resources. In this case only data accessor methods need to be implemented but not data replacement methods (i.e. <- methods that would allow to add or set variables. Read-only backends should implement the isReadOnly() method, that should then return TRUE. Note that backends for purely read-only resources could also implement a caching mechanism to (temporarily) store changes to the data locally within the object (and hence in memory). See information on the MsBackendCached in the Spectra package for more details.

Notes on parallel and chunk-wise processing

For parallel processing, Chromatograms splits the backend based on a defined factor and processes each in parallel (or in serial if a SerialParam is used). The splitting factor can be defined for Chromatograms by setting the parameter processingChunkSize. Alternatively, through the backendParallelFactor() method the backend can also suggest a factor that should/could be used for splitting and parallel processing. The default implementation for backendParallelFactor() is to return an empty factor (factor()) hence not suggesting any preferred splitting.

Besides parallel processing, for on-disk backends (i.e., backends that don’t keep all of the data in memory), this chunk-wise processing can also reduce the memory demand for operations, because only the peak data of the current chunk needs to be realized in memory.

API

The ChromBackend class defines core methods that have to be implemented by a MS backend as well as optional methods for which a default implementation is already available. These functions are described in sections Required methods and Optional methods, respectively.

To create a new backend a class extending the virtual ChromBackend needs to be implemented. In the example below we create thus a simple class with a data.frame for general properties (chromatogram variables) and two slots for the retention time and intensity values, representing the actual chromatographic MS data. We store these values as list, each list element representing values for one chromatogram, since the number of values (peaks) can be different between chromatograms. We also define a simple constructor function that returns an empty instance of our new class.

library(Chromatograms)

#' Definition of the backend class extending ChromBackend
setClass("ChromBackendTest",
         contains = "ChromBackend",
         slots = c(
             chromVars = "data.frame",
             rtime = "list",
             intensity = "list"
         ),
         prototype = prototype(
             spectraVars = data.frame(),
             rtime = list(),
             intensity = list()
         ))

#' Simple constructor function
ChromBackendTest <- function() {
    new("ChromBackendTest")
}

The 3 slots @chromVars, @rtime and @intensity will be used to store our MS data: each row in chromVars will contain data for one chromatogram with the columns being the different chromatogram variables (i.e. additional properties of a chromatogram such as its m/z value or MS level) and each element in @rtime and @intensity a numeric vector with the retention times and intensity values representing thus the peaks data of the respective chromatogram. This is only one of the possibly many ways chromatographic data might be represented.

We should ideally also add some basic validity function that ensures the data to be correct (valid). The function below simply checks that the number of rows of the @chromVars slot matches the length of the @rtime and @intensity slots.

#' Basic validation function
setValidity("ChromBackendTest", function(object) {
    if (length(object@rtime) != length(object@intensity) ||
        length(object@rtime) != nrow(object@chromVars))
        return("length of 'rtime' and 'intensity' has to match the number of ",
               "rows of 'chromVars'")
    NULL
})
## Class "ChromBackendTest" [in ".GlobalEnv"]
## 
## Slots:
##                                                   
## Name:   chromVars      rtime  intensity    version
## Class: data.frame       list       list  character
## 
## Extends: "ChromBackend"

We can now create an instance of our new class with the ChromBackendTest() function.

#' Create an empty instance of ChromBackendTest
be <- ChromBackendTest()
be
## An object of class "ChromBackendTest"
## Slot "chromVars":
## data frame with 0 columns and 0 rows
## 
## Slot "rtime":
## list()
## 
## Slot "intensity":
## list()
## 
## Slot "version":
## [1] "0.1"

A show() method would allow for a more convenient way how general information of our object is displayed. Below we add an implementation of the show() method.

#' implementation of show for ChromBackendTest
setMethod("show", "ChromBackendTest", function(object) {
    cd <- object@chromVars
    cat(class(object), "with", nrow(cd), "chromatograms\n")
})
be
## ChromBackendTest with 0 chromatograms

Required methods

Methods listed in this section must be implemented for a new class extending ChromBackend. Methods should ideally also be implemented in the order they are listed here. Also, it is strongly advised to write dedicated unit tests for each newly implemented method or function already during the development.

dataStorage()

The dataStorage chromatogram variable provides information how or where the data is stored. The dataStorage() method should therefore return a character vector of length equal to the number of chromatograms that are represented by the object. The values for dataStorage can be any character value, except NA. For our example backend we define a simple dataStorage() method that simply returns the column "dataStorage" from the @chromVars (as a character).

#' dataStorage method to provide information *where* data is stored
setMethod("dataStorage", "ChromBackendTest", function(object) {
    as.character(object@chromVars$dataStorage)
})

Calling dataStorage() on our example backend will thus return an empty character (since the object created above does not contain any data).

## character(0)

length()

length() is expected to return an integer of length 1 with the total number of chromatograms that are represented by the backend. For our example backend we simply return the number of rows of the data.frame stored in the @chromVars slot.

#' length to provide information on the number of chromatograms
setMethod("length", "ChromBackendTest", function(x) {
    nrow(x@chromVars)
})
length(be)
## [1] 0

backendInitialize()

The backendInitialize() method is expected to be called after creating an instance of the backend class and should prepare (initialize) the backend with data. This method can take any parameters needed by the backend to get loaded/initialized with data (which can be file names from which to load the data, a database connection or object(s) containing the data). During backendInitialize() it is also suggested to set the special spectra variables dataStorage and dataOrigin are set.

Below we define a backendInitialize() method that takes as arguments a data.frame with spectra variables and two lists with the retention time and intensity values for each spectrum.

#' backendInitialize method to fill the backend with data.
setMethod(
    "backendInitialize", "ChromBackendTest",
    function(object, chromVars, rtime, intensity) {
        if (!is.data.frame(chromVars))
            stop("'chromVars' needs to be a 'data.frame' with the general",
                 "chromatogram variables")
        ## Defining dataStorage and dataOrigin, if not available
        if (is.null(chromVars$dataStorage))
            chromVars$dataStorage <- "<memory>"
        if (is.null(chromVars$dataOrigin))
            chromVars$dataOrigin <- "<user provided>"
        object@chromVars <- chromVars
        object@rtime <- rtime
        object@intensity <- intensity
        validObject(object)
        object
    })

In addition to adding the data to object, the function also defined the dataStorage and dataOrigin spectra variables. The purpose of these two variables is to provide some information on where the data is currently stored (in memory as in our example) and from where the data is originating.

We can now create an instance of our backend class and fill it with data. We thus first define our MS data and pass this to the backendInitialize() method.

#' A data.frame with chromatogram variables.
cvars <- data.frame(msLevel = c(1L, 1L, 1L),
                    mz = c(112.2, 123.3, 134.4))
#' retention time values for each chromatogram.
rts <- list(c(12.4, 12.8, 13.2, 14.6),
            c(45.1, 46.2),
            c(64.4, 64.8, 65.2))
#' intensity values for each chromatogram.
ints <- list(c(123.3, 153.6, 2354.3, 243.4),
             c(100, 80.1),
             c(12.3, 135.2, 100))

#' Create and initialize the backend
be <- backendInitialize(ChromBackendTest(),
                        chromVars = cvars, rtime = rts, intensity = ints)
be
## ChromBackendTest with 3 chromatograms

While this method works and is compliant with the MsBackend API (because there is no requirement on the input parameters for the backendInitialize() method), it would be good practice for backends to support an additional parameter data that would allow passing the complete MS data (including retention time and intensity values) to the function as a DataFrame. This would simplify the implementation of some replacement methods and would in addition also allow to change the backend of a Chromatograms using the setBackend() function to our new backend. Also, it is highly suggested to check the validity of the input data within the initialize method. The advantage of performing these validity checks in backendInitialize() over adding them with setValidity() is that eventually computationally expensive operations/checks would only performed once instead of each time values within the object are changed (e.g. by subsetting or similar), which would be the case with validation functionality registered with setValidity().

We thus re-implement the backendInitialize() method supporting also the data parameter mentioned above and add additional validity checks. These validity checks verify that only numeric values are provided with rtime and intensity, that the number of retention time and intensity values matches for each chromatogram. We also use the validChromData() function that checks that provided core chromatogram variables have the correct data type.

#' Reimplementation of backendInitialize with a `data` parameter and
#' additional input validation
setMethod(
    "backendInitialize", "ChromBackendTest",
    function(object, chromVars, rtime, intensity, data) {
        ## Extract relevant information from a parameter `data` if provided
        if (!missing(data)) {
            chromVars <- as.data.frame(
                data[, !colnames(data) %in% c("rtime", "intensity")])
            if (any(colnames(data) == "rtime"))
                rtime <- data$rtime
            if (any(colnames(data) == "intensity"))
                intensity <- data$intensity
        }
        ## Check that provided variables have the correct data type
        validChromData(chromVars)
        n <- nrow(chromVars)
        ## Validate rtime and intensity
        if (missing(rtime))
            rtime <- vector("list", n)
        if (missing(intensity))
            intensity <- vector("list", n)
        if (length(rtime) != length(intensity) || length(rtime) != n)
            stop("lengths of 'rtime' and 'intensity' need to match the ",
                 "number of chromatograms (i.e., nrow of 'chromVars'")
        if (any(lengths(rtime) != lengths(intensity)))
            stop("the number of data values in 'rtime' and 'intensity' have ",
                 "to match")
        if (!all(vapply(rtime, is.numeric, logical(1))))
            stop("'rtime' has to be a list of numeric values")
        if (!all(vapply(intensity, is.numeric, logical(1))))
            stop("'intensity' has to be a list of numeric values")
        ## If rtime or itensity is of type NumericList convert to list
        if (inherits(rtime, "NumericList"))
            rtime <- as.list(rtime)
        if (inherits(intensity, "NumericList"))
            intensity <- as.list(intensity)
        ## Setting dataStorage and dataOrigin
        chromVars$dataStorage <- rep("<memory>", n)
        if (is.null(chromVars$dataOrigin))
            chromVars$dataOrigin <- rep("<user provided>", n)
        ## Fill object with data
        object@chromVars <- as.data.frame(chromVars)
        object@rtime <- rtime
        object@intensity <- intensity
        validObject(object)
        object
    })

This extended backendInitialize() implementation would now also assure data validity and integrity. Below we use this function again to create our backend instance.

#' Create and initialize the backend
be <- backendInitialize(ChromBackendTest(),
                        chromVars = cvars, rtime = rts,
                        intensity = ints)
be
## ChromBackendTest with 3 chromatograms

The backendInitialize() method that we implemented for our backend class expects the user to provide the full MS data. It would alternatively also be possible to implement a method that takes data file names as input from which the function can then import the data. The purpose of the backendInitialize() method is to initialize and prepare the data in a way that it can be accessed by a Chromatograms object. Whether the data is actually loaded into memory or simply referenced and loaded upon request does not matter as long as the backend is able to provide the data though its accessor methods when requested by the Chromatograms object.

chromVariables()

The chromVariables() method should return a character vector with the names of all available chromatogram variables of the backend. While a backend class should support defining and providing their own variables, each ChromBackend class must provide also the core chromatogram variables (in the correct data type). These can be listed by the coreChromVariables() function:

#' List core chromatogram variables along with data types.
coreChromVariables()
##      chromIndex collisionEnergy      dataOrigin     dataStorage       intensity 
##       "integer"       "numeric"     "character"     "character"   "NumericList" 
##         msLevel              mz           mzMin           mzMax     precursorMz 
##       "integer"       "numeric"       "numeric"       "numeric"       "numeric" 
##  precursorMzMin  precursorMzMax       productMz    productMzMin    productMzMax 
##       "numeric"       "numeric"       "numeric"       "numeric"       "numeric" 
##           rtime 
##   "NumericList"

A typical chromVariables() method for a ChromBackend class will thus be implemented similarly to the one for our ChromBackendTest test backend: it will return the union of the core chromatogram variables and the names for all available spectra variables within the backend object.

#' Accessor for available chromatogram variables
setMethod("chromVariables", "ChromBackendTest", function(object) {
    union(names(coreChromVariables()), colnames(object@chromVars))
})
chromVariables(be)
##  [1] "chromIndex"      "collisionEnergy" "dataOrigin"      "dataStorage"    
##  [5] "intensity"       "msLevel"         "mz"              "mzMin"          
##  [9] "mzMax"           "precursorMz"     "precursorMzMin"  "precursorMzMax" 
## [13] "productMz"       "productMzMin"    "productMzMax"    "rtime"

chromData()

The chromData method should return the full chromatogram data within a backend as a DataFrame object (defined in the S4Vectors package). A parameter columns should allow to define the names of the variables that should be returned. Each row in this data frame should represent one chromatogram, each column a chromatogram variable. Columns "rtime" and "intensity" (if requested) have to contain each a NumericList with the retention time and intensity values of the chromatograms. The DataFrame must provide values (even if they are NA) for all requested spectra variables of the backend (including the core chromatogram variables). The fillCoreChromVariables() function from the Chromatograms package allows to complete (fill) a provided data.frame with eventually missing core chromatogram variables (columns):

#' Get the data.frame with the available chrom variables
be@chromVars
##   msLevel    mz dataStorage      dataOrigin
## 1       1 112.2    <memory> <user provided>
## 2       1 123.3    <memory> <user provided>
## 3       1 134.4    <memory> <user provided>
#' Complete this data.frame with missing core variables
fillCoreChromVariables(be@chromVars)
##   msLevel    mz dataStorage      dataOrigin chromIndex collisionEnergy
## 1       1 112.2    <memory> <user provided>         NA              NA
## 2       1 123.3    <memory> <user provided>         NA              NA
## 3       1 134.4    <memory> <user provided>         NA              NA
##   dataOrigin dataStorage msLevel mz mzMin mzMax precursorMz precursorMzMin
## 1       <NA>        <NA>      NA NA    NA    NA          NA             NA
## 2       <NA>        <NA>      NA NA    NA    NA          NA             NA
## 3       <NA>        <NA>      NA NA    NA    NA          NA             NA
##   precursorMzMax productMz productMzMin productMzMax
## 1             NA        NA           NA           NA
## 2             NA        NA           NA           NA
## 3             NA        NA           NA           NA

We can thus use this function to add eventually missing core chromatogram variables in the chromData implementation for our backend:

#' function to extract the full chrom data; we would need to import the
#' `DataFrame()` function from the S4Vectors package and the `NumericList`
#' from the IRanges package.
setMethod(
    "chromData", "ChromBackendTest",
    function(object, columns = chromVariables(object)) {
        if (!all(columns %in% chromVariables(object)))
            stop("Some of the requested variables are not available")
        res <- S4Vectors::DataFrame(object@chromVars)
        ## Add rtime and intensity values to the result; would need to
        ## import the `NumericList()` function from the IRanges package
        res$rtime <- IRanges::NumericList(object@rtime, compress = FALSE)
        res$intensity <- IRanges::NumericList(
                                      object@intensity, compress = FALSE)
        ## Fill with eventually missing core variables
        res <- fillCoreChromVariables(res)
        res[, columns, drop = FALSE]
})

We can now use chromData() to either extract the full chromatogram data from the backend, or only the data for selected variables.

#' Extract the full data
chromData(be)
## DataFrame with 3 rows and 16 columns
##   chromIndex collisionEnergy      dataOrigin dataStorage
##    <integer>       <numeric>     <character> <character>
## 1         NA              NA <user provided>    <memory>
## 2         NA              NA <user provided>    <memory>
## 3         NA              NA <user provided>    <memory>
##                  intensity   msLevel        mz     mzMin     mzMax precursorMz
##              <NumericList> <integer> <numeric> <numeric> <numeric>   <numeric>
## 1  123.3, 153.6,2354.3,...         1     112.2        NA        NA          NA
## 2              100.0, 80.1         1     123.3        NA        NA          NA
## 3         12.3,135.2,100.0         1     134.4        NA        NA          NA
##   precursorMzMin precursorMzMax productMz productMzMin productMzMax
##        <numeric>      <numeric> <numeric>    <numeric>    <numeric>
## 1             NA             NA        NA           NA           NA
## 2             NA             NA        NA           NA           NA
## 3             NA             NA        NA           NA           NA
##                rtime
##        <NumericList>
## 1 12.4,12.8,13.2,...
## 2          45.1,46.2
## 3     64.4,64.8,65.2
#' Selected variables
chromData(be, c("rtime", "mz", "msLevel"))
## DataFrame with 3 rows and 3 columns
##                rtime        mz   msLevel
##        <NumericList> <numeric> <integer>
## 1 12.4,12.8,13.2,...     112.2         1
## 2          45.1,46.2     123.3         1
## 3     64.4,64.8,65.2     134.4         1
#' Only missing core spectra variables
chromData(be, c("collisionEnergy", "mzMin"))
## DataFrame with 3 rows and 2 columns
##   collisionEnergy     mzMin
##         <numeric> <numeric>
## 1              NA        NA
## 2              NA        NA
## 3              NA        NA

peaksData()

The peaksData() method extracts the chromatographic data (peaks), i.e., the chromatograms’ retention time and intensity values. This data is returned as a list of arrays, with one array per chromatogram with columns being the peaks variables (retention time and intensity values) and rows the individual data pairs. Each backend must provide retention times and intensity values with this method, but additional peaks variables (columns) are also supported.

Below we implement the peaksData() method for our backend. Due to the way we stored the retention time and intensity values within our object we need to loop over the respective lists (in @rtime and intensity) and combine the values of each chromatogram to an array (matrix). Since our backend does not allow any additional other peaks variables we allow columns to be only c("rtime", "intensity"), and also only in that specific order.

#' method to extract the full chromatographic data as list of arrays
setMethod(
    "peaksData", "ChromBackendTest",
    function(object, columns = c("rtime", "intensity")) {
        if (length(columns) != 2 && columns != c("rtime", "intensity"))
            stop("'columns' supports only \"rtime\" and \"intensity\"")
        mapply(rtime = object@rtime, intensity = object@intensity,
               FUN = cbind, SIMPLIFY = FALSE, USE.NAMES = FALSE)
    })

And with this method we can now extract the peaks data from our backend.

#' Extract the *peaks* data (i.e. intensity and retention times)
peaksData(be)
## [[1]]
##      rtime intensity
## [1,]  12.4     123.3
## [2,]  12.8     153.6
## [3,]  13.2    2354.3
## [4,]  14.6     243.4
## 
## [[2]]
##      rtime intensity
## [1,]  45.1     100.0
## [2,]  46.2      80.1
## 
## [[3]]
##      rtime intensity
## [1,]  64.4      12.3
## [2,]  64.8     135.2
## [3,]  65.2     100.0

Since the peaksData() method is the main function used by a Chromatograms to retrieve data from the backend (and further process the values), this method should be implemented in an efficient way. Due to the way we store the data within our example backend we need to loop over the @rtime and @intensity slots. A different implementation that stores the peaks data already as a list of arrays would be more efficient for this operation (but eventually slower for some other operations, such as extracting peaks variables separately with the rtime() or intensity() functions.

[

The [ method allows to subset ChromBackend objects. This operation is expected to reduce a ChromBackend object to the selected chromatograms without changing values for the subset chromatograms. The method should support to subset by indices or logical vectors and should also support duplicating elements (i.e., when duplicated indices are used) as well as to subset in arbitrary order. An error should be thrown if indices are out of bounds, but the method should also support returning an empty backend with [integer()]. The MsCoreUtils::i2index function can be used to check and convert the provided parameter i (defining the subset) to an integer vector.

Below we implement a possible [ for our test backend class. We ignore the parameters j from the definition of the [ generic, since we treat our data to be one-dimensional (with each chromatogram being one element).

#' Main subset method.
setMethod("[", "ChromBackendTest", function(x, i, j, ..., drop = FALSE) {
    i <- MsCoreUtils::i2index(i, length = length(x))
    x@chromVars <- x@chromVars[i, ]
    x@rtime <- x@rtime[i]
    x@intensity <- x@intensity[i]
    x
})

We can now subset our backend to the last two chromatograms.

a <- be[2:3]
chromData(a)
## DataFrame with 2 rows and 16 columns
##   chromIndex collisionEnergy      dataOrigin dataStorage         intensity
##    <integer>       <numeric>     <character> <character>     <NumericList>
## 1         NA              NA <user provided>    <memory>       100.0, 80.1
## 2         NA              NA <user provided>    <memory>  12.3,135.2,100.0
##     msLevel        mz     mzMin     mzMax precursorMz precursorMzMin
##   <integer> <numeric> <numeric> <numeric>   <numeric>      <numeric>
## 1         1     123.3        NA        NA          NA             NA
## 2         1     134.4        NA        NA          NA             NA
##   precursorMzMax productMz productMzMin productMzMax          rtime
##        <numeric> <numeric>    <numeric>    <numeric>  <NumericList>
## 1             NA        NA           NA           NA      45.1,46.2
## 2             NA        NA           NA           NA 64.4,64.8,65.2

Or extracting the second chromatogram multiple times.

a <- be[c(2, 2, 2)]
chromData(a)
## DataFrame with 3 rows and 16 columns
##     chromIndex collisionEnergy      dataOrigin dataStorage     intensity
##      <integer>       <numeric>     <character> <character> <NumericList>
## 2           NA              NA <user provided>    <memory>   100.0, 80.1
## 2.1         NA              NA <user provided>    <memory>   100.0, 80.1
## 2.2         NA              NA <user provided>    <memory>   100.0, 80.1
##       msLevel        mz     mzMin     mzMax precursorMz precursorMzMin
##     <integer> <numeric> <numeric> <numeric>   <numeric>      <numeric>
## 2           1     123.3        NA        NA          NA             NA
## 2.1         1     123.3        NA        NA          NA             NA
## 2.2         1     123.3        NA        NA          NA             NA
##     precursorMzMax productMz productMzMin productMzMax         rtime
##          <numeric> <numeric>    <numeric>    <numeric> <NumericList>
## 2               NA        NA           NA           NA     45.1,46.2
## 2.1             NA        NA           NA           NA     45.1,46.2
## 2.2             NA        NA           NA           NA     45.1,46.2

$

The $ method is expected to extract a single chromatogram variable from a backend. Parameter name should allow to name the chromatogram variable to return. Each ChromBackend must support extracting the core chromatogram variables with this method (even if no data might be available for that variable). In our example implementation below we make use of the chromData() method, but more efficient implementations might be possible as well (that would not require to first subset/create a DataFrame with the full data and to then subset that again to an individual column). Also, the $ method should check if the requested spectra variable is available and should throw an error otherwise.

#' Access a single chromatogram variable
setMethod("$", "ChromBackendTest", function(x, name) {
    chromData(x, columns = name)[, 1L]
})

With this we can now extract the MS levels

be$msLevel
## [1] 1 1 1

or a core spectra variable without values in our example backend.

be$precursorMz
## [1] NA NA NA

or also the intensity values

be$intensity
## NumericList of length 3
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 135.2 100

backendMerge()

The backendMerge() method merges (combines) ChromBackend objects (of the same type!) into a single instance. For our test backend we thus need to combine the values in the @chromVars, @rtime and @intensity slots. To support also merging of data.frames with different sets of columns we use the MsCoreUtils::rbindFill function instead of a simple rbind (this function joins data frames making an union of all available columns filling eventually missing columns with NA).

#' Method allowing to join (concatenate) backends
setMethod("backendMerge", "ChromBackendTest", function(object, ...) {
    res <- object
    object <- unname(c(list(object), list(...)))
    res@rtime <- do.call(c, lapply(object, function(z) z@rtime))
    res@intensity <- do.call(c, lapply(object, function(z) z@intensity))
    res@chromVars <- do.call(MsCoreUtils::rbindFill,
                             lapply(object, function(z) z@chromVars))
    validObject(res)
    res
})

Testing the function by merging the example backend instance with itself.

a <- backendMerge(be, be[2], be)
a
## ChromBackendTest with 7 chromatograms

Data replacement methods

As stated in the general description, ChromBackend implementations can also be purely read-only resources allowing to just access, but not to replace data. For these backends isReadOnly() should return FALSE. Data replacement methods listed in this section would not need to be implemented. Our example backend stores the full data in memory, within the object, and hence we can easily change and replace values.

Since we support replacing values we also implement the isReadOnly() method for our example implementation to return FALSE (instead of the default TRUE).

#' Default for backends:
isReadOnly(be)
## [1] TRUE
#' Implementation of isReadOnly for ChromBackendTest
setMethod("isReadOnly", "ChromBackendTest", function(object) FALSE)
isReadOnly(be)
## [1] FALSE

All data replacement function are expected to return an instance of the same backend class that was used as input.

chromData<-

The main replacement method is chromData<- which should allow to replace the content of a backend with new data. This data is expected to be provided as a DataFrame (similar to the one returned by chromData()). Also the method is expected to replace the full data within the backend, i.e., all chromatogram and peaks variables. While values can be replaced, the number of chromatograms before and after a call to chromData<- has to be the same. For our example implementation of chromData<- we can re-use the backendInitialize() method defined before, with the data parameter.

#' Replacement method for the full chromatogram data
setReplaceMethod("chromData", "ChromBackendTest", function(object, value) {
    if (!inherits(value, "DataFrame"))
        stop("'value' is expected to be a 'DataFrame'")
    if (length(object) && length(object) != nrow(value))
        stop("'value' has to be a 'DataFrame' with ", length(object), " rows")
    object <- backendInitialize(ChromBackendTest(), data = value)
    object
})

To test this new method we extract the full chromatogram data from our example data set, add an additional column (chromatogram variable) and use chromData<- to replace the data of the backend.

d <- chromData(be)
d$new_col <- c("a", "b", "c")

chromData(be) <- d

Check that we have now also the new column available.

be$new_col
## [1] "a" "b" "c"

$<-

The $<- method should allow to replace values for an existing chromatogram variable or to add an additional variable to the backend. As with all replacement methods, the length of value has to match the number of chromatograms represented by the backend. For replacement of retention time or intensity values we need also to ensure that the data would be correct after the operation, i.e., that the number of retention time and intensity values per chromatogram are the identical and that all retention time and intensity values are numeric. Finally, we use the validChromData() function to ensure that, after replacement, all core chromatogram variables have the correct data type.

#' Replace or add a single chromatogram variable.
setReplaceMethod("$", "ChromBackendTest", function(x, name, value) {
    if (length(value) != length(be))
        stop("length of 'value' needs to match the number of chromatograms ",
             "in object.")
    if (name %in% c("rtime", "intensity")) {
        ## In case retention time or intensity values are provided as
        ## NumericList convert to a list.
        if (is(value, "NumericList"))
            value <- as.list(value)
        ## Ensure number of retention time and intensity values match
        if (!all(lengths(value) == lengths(x@intensity)))
            stop("Number of retention time values needs to match number of ",
                 "intensity values.")
        ## Ensure all values are numeric
        if (!all(vapply(value, is.numeric, logical(1))))
            stop("For replacement of retention time or intensity values, ",
                 "'value' is expected to be a list of numeric vectors.")
        if (name == "rtime")
            x@rtime <- value
        if (name == "intensity")
            x@intensity <- value
    } else
        x@chromVars[[name]] <- value
    ## Check that data types are correct after replacement
    validChromData(x@chromVars)
    x
})

We can thus replace an existing chromatogram variable, such as msLevel:

#' Values before replacement
be$msLevel
## [1] 1 1 1
#' Replace MS levels
be$msLevel <- c(3L, 2L, 1L)

#' Values after replacement
be$msLevel
## [1] 3 2 1

We can also add a new chromatogram variables:

#' Add a new chromatogram variable
be$name <- c("A", "B", "C")
be$name
## [1] "A" "B" "C"

Or also replace intensity values. Below we replace the intensity values by adding a value of +3 to each.

#' Replace intensity values
be$intensity <- be$intensity + 3
be$intensity
## NumericList of length 3
## [[1]] 126.3 156.6 2357.3 246.4
## [[2]] 103 83.1
## [[3]] 15.3 138.2 103

selectChromVariables()

The selectChromVariables() function should subset the content of a backend to the selected chromatogram variables, that can be specified with parameter chromVariables. As a result the input backend should be returned, but reduced to the selected chromatogram variables. This function thus adds a subset operation that reduces the data in a backend by columns, dropping all chromatogram variables other than the ones specified with the chromVariables parameter. In the implementation we need to give special care to variables "rtime" and "intensity". If both are about to be removed we need to initialize the @rtime and @intensity slots with empty lists matching the number of chromatograms in our backend. If only "intensity" values are to be removed we replace them with NA_real_ while removing only "rtime" is not supported (also because retention time values of NA are not allowed).

#' Method to *subset* a backend by chromatogram variables (columns)
setMethod(
    "selectChromVariables", "ChromBackendTest",
    function(object, chromVariables = chromVariables(object)) {
        keep <- colnames(object@chromVars) %in% chromVariables
        object@chromVars <- object@chromVars[, keep, drop = FALSE]
        ## If neither "rtime" and "intensity" is in chromVariables: initialize
        ## with empty vectors.
        if (!any(c("rtime", "intensity") %in% chromVariables)) {
            object@rtime <- vector("list", length(object))
            object@intensity <- vector("list", length(object))
        } else {
            ## intensity not in chromVariables: replace intensity values with NA
            if (!"intensity" %in% chromVariables)
                object@intensity <- lapply(object@intensity,
                                           function(z) rep(NA_real_, length(z)))
            ## removal of only rtime is not supported
            if (!"rtime" %in% chromVariables)
                stop("Exclusive removal of retention times is not supported. ",
                     "Retention times can only be removed if also intensity ",
                     "values are removed.")
        }
        validObject(object)
        object
    })

We can now restrict the data set to only selected chrom variables:

#' keep only dataStorage and msLevel
be_2 <- selectChromVariables(be, c("dataStorage", "msLevel"))
chromData(be_2)
## DataFrame with 3 rows and 16 columns
##   chromIndex collisionEnergy  dataOrigin dataStorage     intensity   msLevel
##    <integer>       <numeric> <character> <character> <NumericList> <integer>
## 1         NA              NA          NA    <memory>                       3
## 2         NA              NA          NA    <memory>                       2
## 3         NA              NA          NA    <memory>                       1
##          mz     mzMin     mzMax precursorMz precursorMzMin precursorMzMax
##   <numeric> <numeric> <numeric>   <numeric>      <numeric>      <numeric>
## 1        NA        NA        NA          NA             NA             NA
## 2        NA        NA        NA          NA             NA             NA
## 3        NA        NA        NA          NA             NA             NA
##   productMz productMzMin productMzMax         rtime
##   <numeric>    <numeric>    <numeric> <NumericList>
## 1        NA           NA           NA              
## 2        NA           NA           NA              
## 3        NA           NA           NA

Replacing/removing intensity values would be possible:

#' Keep dataStorage, msLevel, mz and rtime
be_2 <- selectChromVariables(be, c("dataStorage", "msLevel", "mz", "rtime"))
chromData(be_2)
## DataFrame with 3 rows and 16 columns
##   chromIndex collisionEnergy  dataOrigin dataStorage     intensity   msLevel
##    <integer>       <numeric> <character> <character> <NumericList> <integer>
## 1         NA              NA          NA    <memory>  NA,NA,NA,...         3
## 2         NA              NA          NA    <memory>         NA,NA         2
## 3         NA              NA          NA    <memory>      NA,NA,NA         1
##          mz     mzMin     mzMax precursorMz precursorMzMin precursorMzMax
##   <numeric> <numeric> <numeric>   <numeric>      <numeric>      <numeric>
## 1     112.2        NA        NA          NA             NA             NA
## 2     123.3        NA        NA          NA             NA             NA
## 3     134.4        NA        NA          NA             NA             NA
##   productMz productMzMin productMzMax              rtime
##   <numeric>    <numeric>    <numeric>      <NumericList>
## 1        NA           NA           NA 12.4,12.8,13.2,...
## 2        NA           NA           NA          45.1,46.2
## 3        NA           NA           NA     64.4,64.8,65.2

All intensity values are thus NA. Removing only intensity values would (should) throw an error.

peaksData<-

The peaksData<- method should allow to replace the full peaks data (retention time and intensity value pairs) of all chromatograms in a backend. As value a list of arrays (e.g. two column numeric matrices) should be provided with columns names "rtime" and "intensity". Because the full peaks data is provided at once, this method can (and should) support changing also the number of peaks per chromatogram (while the methods like rtime<- or $rtime would not allow). In our implementation we need to ensure that a) the provided list is of length equal to the number of chromatograms and b) each element is a numeric matrix with "rtime" and "intensity" columns from which we can extract the values.

#' replacement method for peaks data
setReplaceMethod("peaksData", "ChromBackendTest", function(object, value) {
    if (!(is.list(value) || inherits(value, "SimpleList")))
        stop("'value' has to be a list-like object")
    if (!length(value) == length(object))
        stop("The length of the provided list has to match the number of ",
             "chromatograms in 'object'")
    ## First loop to check also for validity of the matrices, i.e. each element
    ## has to be a `numeric` `matrix` with columns named "rtime" and "intensity"
    object@rtime <- lapply(value, function(z) {
        if (!is.matrix(z) || !is.numeric(z))
            stop("'value' is expected to be a 'list' of numeric matrices")
        if (!all(c("rtime", "intensity") %in% colnames(z)))
            stop("All matrices in 'value' need to have columns named ",
                 "\"rtime\" and \"intensity\"")
        z[, "rtime"]
    })
    object@intensity <- lapply(value, "[", , "intensity")
    validObject(object)
    object
})

With this method we can now replace the peaks data of a backend:

#' Create a list with peaks matrices; our backend has 3 chromatograms
#' thus our `list` has to be of length 3
tmp <- list(
    cbind(rtime = c(12.3, 14.4, 15.4, 16.4),
          intensity = c(200, 312, 354.1, 232)),
    cbind(rtime = c(14.4),
          intensity = c(13.4)),
    cbind(rtime = c(223.2, 223.8, 234.1, 234.5, 234.9),
          intensity = c(12.3, 45.3, 65.3, 51.1, 29.3))
)
#' Assign this peaks data to one of our test backends
peaksData(be_2) <- tmp

#' Evaluate that we properly added the peaks data
peaksData(be_2)
## [[1]]
##      rtime intensity
## [1,]  12.3     200.0
## [2,]  14.4     312.0
## [3,]  15.4     354.1
## [4,]  16.4     232.0
## 
## [[2]]
##       rtime intensity
## rtime  14.4      13.4
## 
## [[3]]
##      rtime intensity
## [1,] 223.2      12.3
## [2,] 223.8      45.3
## [3,] 234.1      65.3
## [4,] 234.5      51.1
## [5,] 234.9      29.3

Methods with available default implementations

Default implementations for the ChromBackend class are available for a large number of methods. Thus, any backend extending this class will automatically inherit these default implementations. Alternative, class-specific, versions can, but don’t need to be developed. The default versions are defined in the R/ChromBackend.R file, and also listed in this section. If alternative versions are implemented it should be ensured that the expected data type is always used for core chromatogram variables. Use coreChromVariables() to list these mandatory data types.

backendParallelFactor()

The backendParallelFactor() function allows a backend to suggest a preferred way it could be split for parallel processing. The default implementation returns factor() (i.e. a factor of length 0) hence not suggesting any specific splitting setup.

#' Is there a specific way how the object could be best split for
#' parallel processing?
setMethod("backendParallelFactor", "ChromBackend", function(object, ...) {
    factor()
})
## factor()
## Levels:

chromVariables()

The chromVariables() function is expected to return the names of all available chromatogram variables (which should include the core chromatogram variables). The default implementation is:

#' get the available chromatogram variables.
setMethod("chromVariables", "ChromBackend", function(object) {
    colnames(chromData(object))
})

The result from calling the default implementation on our test backend:

##  [1] "chromIndex"      "collisionEnergy" "dataOrigin"      "dataStorage"    
##  [5] "intensity"       "msLevel"         "mz"              "mzMin"          
##  [9] "mzMax"           "precursorMz"     "precursorMzMin"  "precursorMzMax" 
## [13] "productMz"       "productMzMin"    "productMzMax"    "rtime"          
## [17] "new_col"         "name"

chromIndex()

The chromIndex() function should return the value for the "chromIndex" chromatogram variable. As a result, an integer of length equal to the number of chromatograms in object needs to be returned. The default implementation is:

#' get the values for the chromIndex chromatogram variable
setMethod("chromIndex", "ChromBackend",
          function(object, columns = chromVariables(object)) {
              chromData(object, columns = "chromIndex")[, 1L]
          })

The result of calling this method on our test backend:

## [1] NA NA NA

collisionEnergy()

The collisionEnergy() function should return the value for the "collisionEnergy" chromatogram variable. As a result, a numeric of length equal to the number of chromatograms has to be returned. The default implementation is:

#' get the values for the collisionEnergy chromatogram variable
setMethod("collisionEnergy", "ChromBackend", function(object) {
    chromData(object, columns = "collisionEnergy")[, 1L]
})

The result of calling this method on our test backend:

## [1] NA NA NA

The default replacement method for the collisionEnergy chromatogram variable is:

#' Default replacement method for collisionEnergy
setReplaceMethod(
    "collisionEnergy", "ChromBackend", function(object, value) {
        object$collisionEnergy <- value
        object
    })

This method thus makes use of the $<- replacement method we implemented above. To test this function we replace the collision energy below.

#' Replace the collision energy
collisionEnergy(be) <- c(20, 30, 20)
collisionEnergy(be)
## [1] 20 30 20

dataOrigin(), dataOrigin<-

The dataOrigin() and dataOrigin<- methods return or set the value(s) for the "dataOrigin" chromatogram variable. The values for this chromatogram variable need to be of type character (the length equal to the number of chromatograms). The default implementation for dataOrigin() is:

#' Default implementation to access dataOrigin
setMethod("dataOrigin", "ChromBackend", function(object) {
    chromData(object, columns = "dataOrigin")[, 1L]
})

Below we use this method to access the values of the dataOrigin chromatogram variable.

#' Access the dataOrigin values
dataOrigin(be)
## [1] "<user provided>" "<user provided>" "<user provided>"

The default implementation for dataOrigin<- uses, like all defaults for replacement methods, the $<- method:

#' Default implementation of the `dataOrigin<-` replacement method
setReplaceMethod("dataOrigin", "ChromBackend", function(object, value) {
    object$dataOrigin <- value
    object
})

For our backend we can change the values of the dataOrigin variable:

#' Replace the backend's dataOrigin values
dataOrigin(be) <- rep("from somewhere", 3)
dataOrigin(be)
## [1] "from somewhere" "from somewhere" "from somewhere"

dataStorage(), dataStorage<-

Similarly, the dataStorage() and dataStorage<- methods should allow to get or set the data storage chromatogram variable. Values of the dataStorage chromatogram variable are expected to be of type character and for each chromatogram in a backend one value needs to be defined (which can not be NA_character). The default implementation for dataStorage() uses, like most access methods, the chromData() function:

#' Default implementation to access dataStorage
setMethod("dataStorage", "ChromBackend", function(object) {
    chromData(object, columns = "dataStorage")[, 1L]
})

Below we use this method to access the values of the dataStorage chromatogram variable.

#' Access the dataStorage values
dataStorage(be)
## [1] "<memory>" "<memory>" "<memory>"

Note that this variable is supposed to provide information on the location where the data is stored and hence for some type of backends it might not be possible or advised to let the user change its values. For such backends a dataStorage<- replacement method should be implemented specifically that throws an error if values are replaced with eventually invalid values. The default implementation for this method uses, like all defaults for replacement methods, the $<- method:

#' Default implementation of the `dataStorage<-` replacement method
setReplaceMethod("dataStorage", "ChromBackend", function(object, value) {
    object$dataStorage <- value
    object
})

For our backend we can change the values of the dataStorage variable:

#' Replace the backend's datastorage values
dataStorage(be) <- c("here", "here", "here")
dataStorage(be)
## [1] "here" "here" "here"

intensity(), intensity<-

The intensity() and intensity<- methods allow to extract or set the intensity values of the individual chromatograms represented by the backend. The default for the intensity() function, which is expected to return a list of numeric values with the intensity values of each chromatogram, uses also the chromData() method:

#' Default method to extract intensity values
setMethod("intensity", "ChromBackend", function(object) {
    chromData(object, columns = "intensity")[, 1L]
})

Based on the way our example backend implementation stores the data, accessing the intensity values in this way would not be very efficient. It would be much faster to directly return the content of the @intensity slot, converting that into the expected NumericList. Thus we implement below a more efficient version of the method specifically for our backend:

#' Alternative implementation for our backend
setMethod("intensity", "ChromBackendTest", function(object) {
    IRanges::NumericList(object@intensity, compress = FALSE)
})
intensity(be)
## NumericList of length 3
## [[1]] 126.3 156.6 2357.3 246.4
## [[2]] 103 83.1
## [[3]] 15.3 138.2 103

The default replacement method for intensity values uses the $<- method:

#' Default implementation of the replacement method for intensity values
setReplaceMethod("intensity", "ChromBackend", function(object, value) {
    object$intensity <- value
    object
})

Also here we could implement an alternative version that replaces directly the content of the @intensity slot. We implement such a replacement method further below for the rtime<- method. Here we simply use the default implementation to replace the intensity values with original values divided by 10.

#' Replace intensity values
intensity(be) <- intensity(be) / 10
intensity(be)
## NumericList of length 3
## [[1]] 12.63 15.66 235.73 24.64
## [[2]] 10.3 8.31
## [[3]] 1.53 13.82 10.3

isEmpty()

The isEmpty() is a simple helper function to evaluate whether chromatograms are empty, i.e. have no peaks (retention time and intensity values). It should return a logical vector of length equal to the number of chromatograms in the backend with TRUE if a chromatogram is empty and FALSE otherwise. The default implementation uses the lengths() method (defined further below) that returns for each chromatogram the number of available data points (peaks).

#' Default implementation for `isEmpty()`
setMethod("isEmpty", "ChromBackend", function(x) {
    lengths(x) == 0L
})
isEmpty(be)
## [1] FALSE FALSE FALSE

isReadOnly()

As discussed above, backends can also be read-only, hence only allowing to access, but not to change any values (e.g. if the data is stored in a data base and the connection to this data base does not support updating or replacing data). In such cases, the default isReadOnly() method can be used, which returns always TRUE:

#' Default implementation of `isReadOnly()`
setMethod("isReadOnly", "ChromBackend", function(object) {
    TRUE
})

Backends that support changing data values should implement their own version (like we did above) to return FALSE instead:

## [1] FALSE

length()

The length() method should return a single integer with the total number of chromatograms available through the backend. The default implementation for this function is:

#' Default implementation for `length()`
setMethod("length", "ChromBackend", function(x) {
    nrow(chromData(x, columns = "dataStorage"))
})
length(be)
## [1] 3

lengths()

The lengths() function should return the number of data pairs (peaks; retention time or intensity values) per chromatogram. The result should be an integer vector (of length equal to the number of chromatograms in the backend) with these counts. The default implementation uses the intensity() function.

#' Default implementation for `lengths()`
setMethod("lengths", "ChromBackend", function(x) {
    lengths(intensity(x))
})

The number of peaks for our test backend:

## [1] 4 2 3

msLevel(), msLevel<-

The msLevel() and msLevel<- methods should allow extracting and setting the MS level for the individual chromatograms. MS levels are encoded as integer, thus, msLevel() must return an integer vector of length equal to the number of chromatograms of the backend and msLevel<- should take/accept such a vector as input. The default implementations for both methods are shown below.

#' Default methods to get or set MS levels
setMethod("msLevel", "ChromBackend", function(object) {
    chromData(object, columns = "msLevel")[, 1L]
})
setReplaceMethod("msLevel", "ChromBackend", function(object, value) {
    object$msLevel <- value
    object
})

To test these we below replace the MS levels for our test data set and extract these values again.

msLevel(be) <- c(1L, 2L, 4L)
msLevel(be)
## [1] 1 2 4

mz(), mz<-

The mz() and mz<- methods should allow to extract or set the m/z value for each chromatogram. The m/z value of a chromatogram is encoded as numeric, thus, the methods are expected to return or accept a numeric vector of length equal to the number of chromatograms. The default implementations are shown below.

#' Default implementations to get or set m/z value(s)
setMethod("mz", "ChromBackend", function(object) {
    chromData(object, columns = "mz")[, 1L]
})
setReplaceMethod("mz", "ChromBackend", function(object, value) {
    object$mz <- value
    object
})

We below set and extract these target m/z values.

mz(be) <- c(314.3, 312.5, 542.1)
mz(be)
## [1] 314.3 312.5 542.1

mzMax(), mzMax<-

The mzMax() and mzMax<- methods should allow to extract or set the upper m/z boundary for each chromatogram. m/z values are encoded as numeric, thus, the methods are expected to return or accept a numeric vector of length equal to the number of chromatograms. The default implementations are shown below.

#' Default implementations to get or set upper m/z limits
setMethod("mzMax", "ChromBackend", function(object) {
    chromData(object, columns = "mzMax")[, 1L]
})
setReplaceMethod("mzMax", "ChromBackend", function(object, value) {
    object$mzMax <- value
    object
})

Testing these functions by replacing the upper m/z boundary with new values.

mzMax(be) <- mz(be) + 0.01
mzMax(be)
## [1] 314.31 312.51 542.11

mzMin(),mzMin<-`

The mzMin() and mzMin<- methods should allow to extract or set the lower m/z boundary for each chromatogram. m/z values are encoded as numeric, thus, the methods are expected to return or accept a numeric vector of length equal to the number of chromatograms. The default implementations are shown below.

#' Default methods to get or set the lower m/z boundary
setMethod("mzMin", "ChromBackend", function(object) {
    chromData(object, columns = "mzMin")[, 1L]
})
setReplaceMethod("mzMin", "ChromBackend", function(object, value) {
    object$mzMin <- value
    object
})

Testing these functions by replacing the lower m/z boundary with new values.

mzMin(be) <- mz(be) - 0.01
mzMin(be)
## [1] 314.29 312.49 542.09

peaksVariables()

The peaksVariables() function is supposed to provide the names of the available peaks variables. Backends must provide retention time and intensity values, thus, the default implementation simply returns c("rtime", "intensity"). If additional peaks variables would be available, these could also be listed by the peaksVariables() method.

#' Default implementation for peaksVariables()
setMethod(
    "peaksVariables", "ChromBackend", function(object) {
        c("rtime", "intensity")
    })
## [1] "rtime"     "intensity"

precursorMz(), precursorMz<-

The precursorMz() and precursorMz<- methods are expected to get or set the values for the precursor m/z of each chromatogram (if available). These are encoded as numeric (one value per chromatogram) - and if a value is not available NA_real_ should be returned. The default implementations are:

#' Default implementations to get or set the precursorMz chrom variable
setMethod("precursorMz", "ChromBackend", function(object) {
    chromData(object, columns = "precursorMz")[, 1L]
})
setReplaceMethod("precursorMz", "ChromBackend", function(object, value) {
    object$precursorMz <- value
    object
})

Below we set and get the precursorMz chromatogram variable for our backend.

precursorMz(be) <- c(NA_real_, 123.3, 314.2)
precursorMz(be)
## [1]    NA 123.3 314.2

precursorMzMax(), precursorMzMax<-

These methods are supposed to allow to get and set the precursorMzMax chromatogram variable. The default implementations are:

#' Default implementations for `precursorMzMax`
setMethod("precursorMzMax", "ChromBackend", function(object) {
    chromData(object, columns = "precursorMzMax")[, 1L]
})
setReplaceMethod("precursorMzMax", "ChromBackend", function(object, value) {
    object$precursorMzMax <- value
    object
})

Below we test these functions by setting and extracting the values for this chromatogram variable.

## [1]    NA 123.4 314.3

precursorMzMin(), precursorMzMin<-

These methods are supposed to allow to get and set the precursorMzMin chromatogram variable. The default implementations are:

#' Default implementations for `precursorMzMin`
setMethod("precursorMzMin", "ChromBackend", function(object) {
    chromData(object, columns = "precursorMzMin")[, 1L]
})
setReplaceMethod("precursorMzMin", "ChromBackend", function(object, value) {
    object$precursorMzMin <- value
    object
})

Below we test these functions by setting and extracting the values for this chromatogram variable.

## [1]    NA 123.2 314.1

productMz(), productMz<-

These methods are supposed to allow to get and set the productMz chromatogram variable. The default implementations are:

#' Default implementations for `productMz`
setMethod("productMz", "ChromBackend", function(object) {
    chromData(object, columns = "productMz")[, 1L]
})
setReplaceMethod("productMz", "ChromBackend", function(object, value) {
    object$productMz <- value
    object
})

Below we test these functions by setting and extracting the values for this chromatogram variable.

productMz(be) <- c(123.2, NA_real_, NA_real_)
productMz(be)
## [1] 123.2    NA    NA

productMzMax(), productMzMax<-

These methods are supposed to allow to get and set the productMzMax chromatogram variable. The default implementations are:

#' Default implementations for `productMzMax`
setMethod("productMzMax", "ChromBackend", function(object) {
    chromData(object, columns = "productMzMax")[, 1L]
})
setReplaceMethod("productMzMax", "ChromBackend", function(object, value) {
    object$productMzMax <- value
    object
})

Below we test these functions by setting and extracting the values for this chromatogram variable.

productMzMax(be) <- productMz(be) + 0.02
productMzMax(be)
## [1] 123.22     NA     NA

productMzMin(), productMzMin<-

These methods are supposed to allow to get and set the productMzMin chromatogram variable. The default implementations are:

#' Default implementations for `productMzMin`
setMethod("productMzMin", "ChromBackend", function(object) {
    chromData(object, columns = "productMzMin")[, 1L]
})
setReplaceMethod("productMzMin", "ChromBackend", function(object, value) {
    object$productMzMin <- value
    object
})

Below we test these functions by setting and extracting the values for this chromatogram variable.

productMzMin(be) <- productMz(be) - 0.2
productMzMin(be)
## [1] 123.2    NA    NA

rtime(), rtime<-

The rtime() and rtime<- methods allow to get and set the retention times of the individual chromatograms of the backend. Similar to the method for the intensity values described above they should return or accept a NumericList, each element being a numeric vector with the retention time values of one chromatogram. The default implementations of these methods are shown below.

#' Default methods for `rtime()` and `rtime<-`
setMethod("rtime", "ChromBackend", function(object) {
    chromData(object, columns = "rtime")[, 1L]
})
setReplaceMethod("rtime", "ChromBackend", function(object, value) {
    object$rtime <- value
    object
})

Also these methods use the chromData() function to extract intensity values and the $<- to replace them. Due to the way the data is stored in our example backend implementation this is not the best/most efficient way to get or set these values. Instead, we could implement the rtime() function similar to intensity() above. For rtime<- we implement below a version that takes a list or NumericList as input and directly replaces the values of the @rtime slot. In this method we need also to ensure that the provided data is in the correct format, that the number of values per chromatogram matches the expected values and that no missing values are provided (NA_real_ values are not supported for retention time).

#' Implementation of `rtime<-` for our backend
setReplaceMethod("rtime", "ChromBackendTest", function(object, value) {
    ## Convert to a standard list
    if (inherits(value, "NumericList"))
        value <- as.list(value)
    ## Check that length is correct
    if (!length(value) == length(object))
        stop("Length of 'value' needs to match the number of ",
             "chromatograms in 'object'.")
    ## Check that lengths are correct
    if (!all(lengths(value) == lengths(object@intensity)))
        stop("The number of retention time values per chromatogram need to ",
             "match the numher of intensities for that chromatogram.")
    ## Check that all values are numeric and we don't have missing values
    not_ok <- vapply(value, function(z)
        anyNA(z) | !is.numeric(z), logical(1))
    if (any(not_ok))
        stop("'value' needs to be a list of numeric values without ",
             "missing values")
    object@rtime <- value
    object
})

We below test this implementation replacing the retention times of our example backend by shifting all values by 2 seconds.

rtime(be) <- rtime(be) + 2
rtime(be)
## NumericList of length 3
## [[1]] 14.4 14.8 15.2 16.6
## [[2]] 47.1 48.2
## [[3]] 66.4 66.8 67.2

split()

The split() method should split the backend into a list of backends containing subsets of the original backend. The default implementation uses the default implementation of split() from R and should work in most cases. This function uses the [ method to subset/split the object.

#' Default method to split a backend
setMethod("split", "ChromBackend", function(x, f, drop = FALSE, ...) {
    split.default(x, f, drop = drop, ...)
})

We below test this by splitting the backend into two subsets.

split(be, f = c(1, 2, 1))
## $`1`
## ChromBackendTest with 2 chromatograms
## 
## $`2`
## ChromBackendTest with 1 chromatograms

Filter methods with default implementations

A set of filter methods is defined that all allow to subset the backend to a smaller set of chromatograms, i.e. these filter methods reduce the number of chromatograms of the backend. Defaults are available for all methods, but also here alternative versions might be implemented depending on the backend class.

filterDataOrigin()

The filterDataOrigin() method allows to filter/subset the backend keeping only chromatograms for which the dataOrigin chromatogram variable matches (exactly) the value(s) provided with parameter dataOrigin.

#' Default for `filterDataOrigin()`
setMethod("filterDataOrigin", "ChromBackend",
          function(object, dataOrigin = character(), ...) {
              if (length(dataOrigin)) {
                  object <- object[dataOrigin(object) %in% dataOrigin]
                  if (is.unsorted(dataOrigin))
                      object[order(match(dataOrigin(object), dataOrigin))]
                  else object
              } else object
          })

Like all filter functions, this function is expected to always return an instance of the backend class, even if no element matches the provided values:

filterDataOrigin(be, "disk")
## ChromBackendTest with 0 chromatograms

filterDataStorage()

The filterDataStorage() method allows to subset a backend keeping only chromatograms for which values of their dataStorage chromatogram variable match the value(s) provided with parameter dataStorage. The default implementation is shown below.

#' Default implementation for `filterDataStorage()`
setMethod("filterDataStorage", "ChromBackend",
          function(object, dataStorage = character()) {
              if (length(dataStorage)) {
                  object <- object[dataStorage(object) %in% dataStorage]
                  if (is.unsorted(dataStorage))
                      object[order(match(dataStorage(object), dataStorage))]
                  else object
              } else object
          })

filterMsLevel()

The filterMsLevel() method allows to subset a backend to chromatograms with their MS level matching the provided MS levels. The default implementation is shown below.

#' The default implementation for `filterMsLevel()`
setMethod("filterMsLevel", "ChromBackend",
          function(object, msLevel = integer()) {
              if (length(msLevel)) {
                  object[msLevel(object) %in% msLevel]
              } else object
          })

filterMzRange()

The filterMzRange() method allows to subset a backend to chromatograms with their value of the mz chromatogram being within the provided m/z value range. Parameter mz is expected to be a numeric of length 2 defining the lower and upper boundary of the m/z range. The default implementation is shown below:

#' The default implementation for `filterMzRange()`
setMethod("filterMzRange", "ChromBackend", function(object, mz = numeric(),
                                                    ...) {
    if (length(mz)) {
        mz <- range(mz)
        keep <- which(between(mz(object), mz))
        object[keep]
    } else object
})

filterMzValues()

The filterMzValues() method allows to subset a backend to chromatograms with their value of the mz chromatogram variable being equal to (one) of the provided m/z values, given an acceptable difference defined by parameters ppm and tolerance.

#' Default for `filterMzValues()`
setMethod("filterMzValues", "ChromBackend",
          function(object, mz = numeric(), ppm = 20, tolerance = 0, ...) {
              if (length(mz)) {
                  object[.values_match_mz(precursorMz(object), mz = mz,
                                          ppm = ppm, tolerance = tolerance)]
              } else object
          })

Session information

## R Under development (unstable) (2024-03-24 r86185)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] Chromatograms_0.1.0 ProtGenerics_1.35.4 BiocStyle_2.31.0   
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.8.8      compiler_4.4.0      BiocManager_1.30.22
##  [4] cluster_2.1.6       jquerylib_0.1.4     systemfonts_1.0.6  
##  [7] IRanges_2.37.1      textshaping_0.3.7   yaml_2.3.8         
## [10] fastmap_1.1.1       R6_2.5.1            knitr_1.45         
## [13] BiocGenerics_0.49.1 htmlwidgets_1.6.4   MASS_7.3-60.2      
## [16] bookdown_0.38       desc_1.4.3          bslib_0.6.2        
## [19] rlang_1.1.3         cachem_1.0.8        xfun_0.43          
## [22] fs_1.6.3            MsCoreUtils_1.15.5  sass_0.4.9         
## [25] memoise_2.0.1       cli_3.6.2           pkgdown_2.0.7.9000 
## [28] magrittr_2.0.3      digest_0.6.35       lifecycle_1.0.4    
## [31] clue_0.3-65         S4Vectors_0.41.5    vctrs_0.6.5        
## [34] evaluate_0.23       ragg_1.3.0          stats4_4.4.0       
## [37] rmarkdown_2.26      purrr_1.0.2         tools_4.4.0        
## [40] htmltools_0.5.8

References