Package: MsIO
Authors: Johannes Rainer [aut, cre] (https://orcid.org/0000-0002-6977-7147), Philippine
Louail [aut] (https://orcid.org/0009-0007-5429-6846), Laurent Gatto
[ctb] (https://orcid.org/0000-0002-1520-2268)
Compiled: Thu Sep 5 12:47:18 2024
Introduction
Data objects in R can be serialized to disk in R’s Rds
format using the base R save()
function and re-imported
using the load()
function. This R-specific binary data
format can however not be used or read by other programming languages
preventing thus the exchange of R data objects between software or
programming languages. The MsIO package provides functionality
to export and import mass spectrometry data objects in various storage
formats aiming to facilitate data exchange between software. This
includes, among other formats, also storage of data objects using
Bioconductor’s alabaster.base
package.
For export or import of MS data objects, the
saveMsObject()
and readMsObject()
functions
can be used. For saveMsObject()
, the first parameter is the
MS data object that should be stored, for readMsObject()
it
defines type of MS object that should be restored (returned). The second
parameter param
defines and configures the storage format
of the MS data. The currently supported formats and the respective
parameter objects are:
-
PlainTextParam
: storage of data in plain text file format. -
AlabasterParam
: storage of MS data using Bioconductor’s alabaster.base framework based files in HDF5 and JSON format.
These storage formats are described in more details in the following sections.
An example use of these functions and parameters:
saveMsObject(x, param = PlainTextParam(storage_path))
to
store an MS data object assigned to a variable x
to a
directory storage_path
using the plain text file format. To
restore the data (assuming x
was an instance of a
MsExperiment
class):
readMsObject(MsExperiment(), param = PlainTextParam(storage_path))
.
Installation
The package can be installed with the BiocManager package.
To install BiocManager use
install.packages("BiocManager")
and, after that,
BiocManager::install("RforMassSpectrometry/MsIO")
to
install this package.
For import or export of MS data objects installation of additional Bioconductor packages might be needed:
-
Spectra
(with
BiocManager::install("Spectra")
) for import or export ofSpectra
orMsBackendMzR
objects. -
MsExperiment
(with
BiocManager::install("MsExperiment")
) for import or export ofMsExperiment
objects. -
xcms (with
BiocManager::install("xcms")
) for import or export ofXcmsExperiment
objects (result objects of xcms-based preprocessing).
Plain text file format
Storage of MS data objects in plain text format aims to support an easy exchange of data, and in particular analysis results, with external software, such as MS-DIAL or mzmine3. In most cases, the data is stored as tabulator delimited text files simplifying the use of the data and results across multiple programming languages, or their import into spreadsheet applications. MS data objects stored in plain text format can also be fully re-imported into R providing thus an alternative, and more flexible, object serialization approach than the R internal Rds/RData format.
Below we create a MS data object (MsExperiment
)
representing the data from two raw MS data files and assign sample
annotation information to these data files.
library(MsIO)
library(MsExperiment)
fls <- dir(system.file("TripleTOF-SWATH", package = "msdata"),
full.names = TRUE)
mse <- readMsExperiment(
fls,
sampleData = data.frame(name = c("Pestmix1 DDA", "Pestmix SWATH"),
mode = c("DDA", "SWATH")))
mse
## Object of class MsExperiment
## Spectra: MS1 (5626) MS2 (10975)
## Experiment data: 2 sample(s)
## Sample data links:
## - spectra: 2 sample(s) to 16601 element(s).
We can export this data object to plain text files using
MsIO’s saveMsObject()
function in combination with
the PlainTextParam
parameter object. The path to the
directory to which the data should be stored can be defined with the
path
parameter of PlainTextParam
. With the
call below we store the MS data object to a temporary directory.
d <- file.path(tempdir(), "ms_experiment_export")
saveMsObject(mse, PlainTextParam(path = d))
The data was exported to a set of text files that we list below:
dir(d)
## [1] "ms_backend_data.txt"
## [2] "ms_experiment_link_mcols.txt"
## [3] "ms_experiment_sample_data_links_spectra.txt"
## [4] "ms_experiment_sample_data.txt"
## [5] "spectra_processing_queue.json"
## [6] "spectra_slots.txt"
Each text file contains information about one particular
slot of the MS data object. See the
?PlainTextParam
help for a description of the files and
their respective formats. We can restore the MS data object again using
the readMsObject()
function, specifying the type of object
we want to restore (and which was stored to the respective directory)
with the first parameter of the function and the data storage format
with the second. In our example we use MsExperiment()
as
first parameter and PlainTextParam
as second. The MS data
of our MsExperiment
data object was represented by a
Spectra
object, thus, to import the data we need in
addition to load the Spectra
package.
library(Spectra)
mse_in <- readMsObject(MsExperiment(), PlainTextParam(d))
mse_in
## Object of class MsExperiment
## Spectra: MS1 (5626) MS2 (10975)
## Experiment data: 2 sample(s)
## Sample data links:
## - spectra: 2 sample(s) to 16601 element(s).
Note that at present MsIO does not support
storage of the full MS data (i.e. the individual mass peaks’
m/z and intensity values) to plain text file. MsIO
supports storage of on-disk data objects/representations (such
as the MsBackendMzR
object) to plain text formats. The
Spectra
object that is used to represent the MS data of our
example MsExperiment
object uses a
MsBackendMzR
backend and thus we were able to export and
import its data. Due to its on-disk data mode, this type of backend
retrieves the MS data on-the-fly from the original data files and hence
we only need to store the MS metadata and the location of the original
data files. Thus, also with the restored MS data object we have full
access to the MS data:
## NumericList of length 6
## [[1]] 0.0307632219046354 0.163443520665169 ... 0.507792055606842
## [[2]] 0.124385602772236 0.306980639696121 ... 0.752154946327209
## [[3]] 0.140656530857086 0.194816112518311 ... 0.455461025238037
## [[4]] 0.0389336571097374 0.357547700405121 ... 0.478326231241226
## [[5]] 0.124386593699455 0.054143700748682 ... 0.251276850700378
## [[6]] 0.0940475389361382 0.247442871332169 ... 0.10762557387352
However, ff the location of the original MS data files was changed
(e.g. if the files or the stored object was moved to a different
location or file system), the new location of these files would be
needed to be specified with parameter spectraPath
(e.g. readMsObject(MsExperiment(), PlainTextParam(d), spectraPath = <path to new location>)
).
Generally, saveMsData()
stores the MS data objects in a
modular way, i.e. the content of each component or slot is exported to
its own data file. The storage directory of our example
MsExperiment
contains thus multiple data files:
dir(d)
## [1] "ms_backend_data.txt"
## [2] "ms_experiment_link_mcols.txt"
## [3] "ms_experiment_sample_data_links_spectra.txt"
## [4] "ms_experiment_sample_data.txt"
## [5] "spectra_processing_queue.json"
## [6] "spectra_slots.txt"
This modularity allows also to load only parts of the original data.
We can for example also load only the Spectra
object
representing the MS experiment’s MS data.
s <- readMsObject(Spectra(), PlainTextParam(d))
s
## MSn data (Spectra) with 16601 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 0.231 1
## 2 1 0.351 2
## 3 1 0.471 3
## 4 1 0.591 4
## 5 1 0.711 5
## ... ... ... ...
## 16597 2 899.527 8995
## 16598 2 899.624 8996
## 16599 2 899.721 8997
## 16600 2 899.818 8998
## 16601 2 899.915 8999
## ... 27 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## PestMix1_SWATH.mzML
Or even only the MsBackendMzR
that is used by the
Spectra
object to represent the MS data.
be <- readMsObject(MsBackendMzR(), PlainTextParam(d))
be
## MsBackendMzR with 16601 spectra
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 0.231 1
## 2 1 0.351 2
## 3 1 0.471 3
## 4 1 0.591 4
## 5 1 0.711 5
## ... ... ... ...
## 16597 2 899.527 8995
## 16598 2 899.624 8996
## 16599 2 899.721 8997
## 16600 2 899.818 8998
## 16601 2 899.915 8999
## ... 27 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## PestMix1_SWATH.mzML
alabaster-based formats
The alabaster
framework and related Bioconductor package alabaster.base
implements methods to save a variety of R/Bioconductor objects to
on-disk representations based on standard file formats like HDF5 and
JSON. This ensures that Bioconductor objects can be easily read from
other languages like Python and Javascript. With
AlabasterParam
, MsIO supports export of MS data
objects into these storage formats. Below we export our example
MsExperiment
to a storage directory using the alabaster
format.
d <- file.path(tempdir(), "ms_experiment_export_alabaster")
saveMsObject(mse, AlabasterParam(path = d))
The contents of the storage folder is listed below:
dir(d, recursive = TRUE)
## [1] "experiment_files/OBJECT"
## [2] "experiment_files/x/list_contents.json.gz"
## [3] "experiment_files/x/OBJECT"
## [4] "metadata/list_contents.json.gz"
## [5] "metadata/OBJECT"
## [6] "OBJECT"
## [7] "other_data/list_contents.json.gz"
## [8] "other_data/OBJECT"
## [9] "sample_data_links_mcols/basic_columns.h5"
## [10] "sample_data_links_mcols/OBJECT"
## [11] "sample_data_links/list_contents.json.gz"
## [12] "sample_data_links/OBJECT"
## [13] "sample_data_links/other_contents/0/array.h5"
## [14] "sample_data_links/other_contents/0/OBJECT"
## [15] "sample_data/basic_columns.h5"
## [16] "sample_data/OBJECT"
## [17] "spectra/backend/OBJECT"
## [18] "spectra/backend/peaks_variables/contents.h5"
## [19] "spectra/backend/peaks_variables/OBJECT"
## [20] "spectra/backend/spectra_data/basic_columns.h5"
## [21] "spectra/backend/spectra_data/OBJECT"
## [22] "spectra/metadata/list_contents.json.gz"
## [23] "spectra/metadata/OBJECT"
## [24] "spectra/OBJECT"
## [25] "spectra/processing_chunk_size/contents.h5"
## [26] "spectra/processing_chunk_size/OBJECT"
## [27] "spectra/processing_queue_variables/contents.h5"
## [28] "spectra/processing_queue_variables/OBJECT"
## [29] "spectra/processing/contents.h5"
## [30] "spectra/processing/OBJECT"
## [31] "spectra/spectra_processing_queue.json"
In contrast to the plain text format described in the previous section, that stores all data files into a single directory, the alabaster export is structured hierarchically into sub-folders by the MS data object’s slots/components.
To restore the object we use the readMsObject()
function
with an AlabasterParam
parameter objects to define the used
data storage format.
mse_in <- readMsObject(MsExperiment(), AlabasterParam(d))
mse_in
## Object of class MsExperiment
## Spectra: MS1 (5626) MS2 (10975)
## Experiment data: 2 sample(s)
## Sample data links:
## - spectra: 2 sample(s) to 16601 element(s).
Also for this format, we can load parts of the data separately. We
can load the MS data as a Spectra
object from the
respective subfolder of the data storage directory:
s <- readMsObject(Spectra(), AlabasterParam(file.path(d, "spectra")))
s
## MSn data (Spectra) with 16601 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 0.231 1
## 2 1 0.351 2
## 3 1 0.471 3
## 4 1 0.591 4
## 5 1 0.711 5
## ... ... ... ...
## 16597 2 899.527 8995
## 16598 2 899.624 8996
## 16599 2 899.721 8997
## 16600 2 899.818 8998
## 16601 2 899.915 8999
## ... 33 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## PestMix1_SWATH.mzML
The import/export functionality is completely compatible with
Bioconductor’s alabaster framework and hence allows also to read the
whole, or parts of the data directly using alabaster’s
readObject()
method. The full MsExperiment
is
restored importing the full directory (i.e. providing the path to the
directory containing the full export with the function’s
path
parameter).
mse_in <- readObject(path = d)
mse_in
## Object of class MsExperiment
## Spectra: MS1 (5626) MS2 (10975)
## Experiment data: 2 sample(s)
## Sample data links:
## - spectra: 2 sample(s) to 16601 element(s).
Alternatively, by providing a path to one of the MS object’s
components, it is possible to read only specific parts of the data.
Below we read the sample annotation information as a
DataFrame
from the sample_data subfolder:
readObject(path = file.path(d, "sample_data"))
## DataFrame with 2 rows and 3 columns
## name mode spectraOrigin
## <character> <character> <character>
## 1 Pestmix1 D... DDA /__w/_temp...
## 2 Pestmix SW... SWATH /__w/_temp...
Session information
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Spectra_1.15.7 BiocParallel_1.39.0 S4Vectors_0.43.2
## [4] BiocGenerics_0.51.1 MsExperiment_1.7.0 ProtGenerics_1.37.1
## [7] MsIO_0.0.4 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] fastmap_1.2.0 lazyeval_0.2.2
## [5] digest_0.6.37 lifecycle_1.0.4
## [7] cluster_2.1.6 alabaster.matrix_1.5.5
## [9] alabaster.base_1.5.6 magrittr_2.0.3
## [11] compiler_4.4.1 rlang_1.1.4
## [13] sass_0.4.9 tools_4.4.1
## [15] igraph_2.0.3 utf8_1.2.4
## [17] yaml_2.3.10 knitr_1.48
## [19] S4Arrays_1.5.7 htmlwidgets_1.6.4
## [21] DelayedArray_0.31.11 plyr_1.8.9
## [23] abind_1.4-5 HDF5Array_1.33.6
## [25] purrr_1.0.2 desc_1.4.3
## [27] grid_4.4.1 fansi_1.0.6
## [29] Rhdf5lib_1.27.0 MASS_7.3-61
## [31] MultiAssayExperiment_1.31.5 SummarizedExperiment_1.35.1
## [33] cli_3.6.3 mzR_2.39.0
## [35] rmarkdown_2.28 crayon_1.5.3
## [37] ragg_1.3.2 generics_0.1.3
## [39] httr_1.4.7 reshape2_1.4.4
## [41] ncdf4_1.23 DBI_1.2.3
## [43] cachem_1.1.0 rhdf5_2.49.0
## [45] stringr_1.5.1 zlibbioc_1.51.1
## [47] parallel_4.4.1 AnnotationFilter_1.29.0
## [49] BiocManager_1.30.25 XVector_0.45.0
## [51] alabaster.schemas_1.5.0 matrixStats_1.4.0
## [53] vctrs_0.6.5 Matrix_1.7-0
## [55] jsonlite_1.8.8 bookdown_0.40
## [57] IRanges_2.39.2 clue_0.3-65
## [59] systemfonts_1.1.0 jquerylib_0.1.4
## [61] tidyr_1.3.1 glue_1.7.0
## [63] pkgdown_2.1.0.9000 codetools_0.2-20
## [65] QFeatures_1.15.2 stringi_1.8.4
## [67] GenomeInfoDb_1.41.1 GenomicRanges_1.57.1
## [69] UCSC.utils_1.1.0 tibble_3.2.1
## [71] pillar_1.9.0 htmltools_0.5.8.1
## [73] rhdf5filters_1.17.0 GenomeInfoDbData_1.2.12
## [75] R6_2.5.1 textshaping_0.4.0
## [77] evaluate_0.24.0 Biobase_2.65.1
## [79] lattice_0.22-6 bslib_0.8.0
## [81] MetaboCoreUtils_1.13.0 Rcpp_1.0.13
## [83] SparseArray_1.5.31 xfun_0.47
## [85] MsCoreUtils_1.17.1 fs_1.6.4
## [87] MatrixGenerics_1.17.0 pkgconfig_2.0.3