Parallel and chunk-wise processing of Spectra
Source: R/Spectra-functions.R
, R/Spectra.R
processingChunkSize.Rd
Many operations on Spectra
objects, specifically those working with
the actual MS data (peaks data), allow a chunk-wise processing in which
the Spectra
is splitted into smaller parts (chunks) that are
iteratively processed. This enables parallel processing of the data (by
data chunk) and also reduces the memory demand since only the MS data
of the currently processed subset is loaded into memory and processed.
This chunk-wise processing, which is by default disabled, can be enabled
by setting the processing chunk size of a Spectra
with the
processingChunkSize()
function to a value which is smaller than the
length of the Spectra
object. Setting processingChunkSize(sps) <- 1000
will cause any data manipulation operation on the sps
, such as
filterIntensity()
or bin()
, to be performed eventually in parallel for
sets of 1000 spectra in each iteration.
Such chunk-wise processing is specifically useful for Spectra
objects
using an on-disk backend or for very large experiments. For small data
sets or Spectra
using an in-memory backend, a direct processing might
however be more efficient. Setting the chunk size to Inf
will disable
the chunk-wise processing.
For some backends a certain type of splitting and chunk-wise processing
might be preferable. The MsBackendMzR
backend for example needs to load
the MS data from the original (mzML) files, hence chunk-wise processing
on a per-file basis would be ideal. The backendParallelFactor()
function
for MsBackend
allows backends to suggest a preferred splitting of the
data by returning a factor
defining the respective data chunks. The
MsBackendMzR
returns for example a factor
based on the dataStorage
spectra variable. A factor
of length 0 is returned if no particular
preferred splitting should be performed. The suggested chunk definition
will be used if no finite processingChunkSize()
is defined. Setting
the processingChunkSize
overrides backendParallelFactor
.
See the Large-scale data handling and processing with Spectra for more information and examples.
Functions to configure parallel or chunk-wise processing:
processingChunkSize()
: allows to get or set the size of the chunks for parallel processing or chunk-wise processing of aSpectra
in general. With a value ofInf
(the default) no chunk-wise processing will be performed.processingChunkFactor()
: returns afactor
defining the chunks into which aSpectra
will be split for chunk-wise (parallel) processing. Afactor
of length 0 indicates that no chunk-wise processing will be performed.
Usage
processingChunkSize(x)
processingChunkSize(x) <- value
processingChunkFactor(x)
# S4 method for class 'Spectra'
backendBpparam(object, BPPARAM = bpparam())
Arguments
- x
Spectra
.- value
integer(1)
defining the chunk size.- object
Spectra
object.- BPPARAM
Parallel setup configuration. See
BiocParallel::bpparam()
for more information.
Value
processingChunkSize()
returns the currently defined processing
chunk size (or Inf
if it is not defined). processingChunkFactor()
returns a factor
defining the chunks into which x
will be split
for (parallel) chunk-wise processing or a factor
of length 0 if
no splitting is defined.
Note
Some backends might not support parallel processing at all.
For these, the backendBpparam()
function will always return a
SerialParam()
independently on how parallel processing was defined.