Parallel and chunk-wise processing of Chromatograms — processingChunkSize,Chromatograms-method • Chromatograms

Many operations on Chromatograms objects, specifically those working with the actual peaks data (see peaksData), allow a chunk-wise processing in which the Chromatograms is split into smaller parts (chunks) that are iteratively processed. This enables parallel processing of the data (by data chunk) and also reduces the memory demand since only the peak data of the currently processed subset is loaded into memory and processed. This chunk-wise processing, which is by default disabled, can be enabled by setting the processing chunk size of a Chromatograms with the processingChunkSize() function to a value which is smaller than the length of the Chromatograms object. Setting processingChunkSize(chr) <- 1000 will cause any data manipulation operation on the chr, such as filterPeaksData(), to be performed eventually in parallel for sets of 1000 chromatograms in each iteration.

Such chunk-wise processing is specifically useful for Chromatograms objects using an on-disk backend or for very large experiments. For small data sets or Chromatograms using an in-memory backend, a direct processing might however be more efficient. Setting the chunk size to Inf will disable the chunk-wise processing.

For some backends a certain type of splitting and chunk-wise processing might be preferable. The ChromBackendMzR backend for example needs to load the MS data from the original (mzML) files, hence chunk-wise processing on a per-file basis would be ideal. The backendParallelFactor() function for ChromBackend allows backends to suggest a preferred splitting of the data by returning a factor defining the respective data chunks. The ChromBackendMzR returns for example a factor based on the dataStorage chromatograms variable. A factor of length 0 is returned if no particular preferred splitting should be performed. The suggested chunk definition will be used if no finite processingChunkSize() is defined. Setting the processingChunkSize overrides backendParallelFactor.

Functions to configure parallel or chunk-wise processing:

processingChunkSize(): allows to get or set the size of the chunks for parallel processing or chunk-wise processing of a Chromatograms in general. With a value of Inf (the default) no chunk-wise processing will be performed.
processingChunkFactor(): returns a factor defining the chunks into which a Chromatograms will be split for chunk-wise (parallel) processing. A factor of length 0 indicates that no chunk-wise processing will be performed.

# S4 method for class 'Chromatograms'
processingChunkSize(object, ...)

# S4 method for class 'Chromatograms'
processingChunkSize(object) <- value

# S4 method for class 'Chromatograms'
processingChunkFactor(object, chunkSize = processingChunkSize(object), ...)

Arguments

object: A Chromatograms object.
...: Additional arguments passed to the methods.
value: integer(1) defining the chunk size.
chunkSize: integer(1) for processingChunkFactor defining the chunk size. The defualt will be the value stored in the Chromatograms object's processingChunkSize slot.

Value

processingChunkSize() returns the currently defined processing chunk size (or Inf if it is not defined). processingChunkFactor() returns a factor defining the chunks into which object will be split for (parallel) chunk-wise processing or a factor of length 0 if no splitting is defined.

Note

This documentation is mostly a placeholder and will be updated when the chunkwise implementation is finalized.

Some backends might not support parallel processing at all. For these, the backendBpparam() function will always return a SerialParam() independently on how parallel processing was defined.

Author

Johannes Rainer, Philippine Louail