This documents describes the general structure of the package and provides helpful references to code and files for contributors. Preferably read the full document.
General info
What is this package good for?
The Spectra package (and the
Spectra
class) provides a powerful infrastructure for mass spectrometry (MS) data in R (possibly see the SpectraTutorials for more information, in particular the Spectra-backends vignette for a description of the data structure).Powerful MS data algorithms algorithms are also available in Python, e.g. provided by the matchms library.
Why re-implement what’s already available?
This package translates an R
Spectra
object into the matchms PythonSpectrum
data structure and allows you to call functions of the matchms package and translate the results back into R data objects.
General package structure
Where to find what?
-
The R folder contains all R source files.
R/conversion.R contains functions to convert between R and Python data structures (e.g. between
Spectra::Spectra
andmatchms.Spectrum
). The conversion of the Python result into an R data type is handled by R’s reticulate package, which can convert all basic data types between R and Python.R/compareSpectriPy.R contains the mass spectral similarity calculation functions. The core function is the internal
.compare_spectra_python()
function that manages the Anaconda environment, translates the data to Python data structures and calls the Python command usingpy_run_string()
. The Python command itself is generated by thepython_command()
(e.g. this) command called on the parameter objectCosineGreedyParam
. To use a new similarity calculation function or a new Python functionality/algorithm, ideally a new param object is implemented with thepython_command()
method, which returns the python command that is specific to the new algorithm/Python functionality to run in Python.
The tests folder contains all unit tests. A general testthat.R file that configures and sets up the tests and a unit test file for each R source file (named test_
.R ) within the testthat folder.The vignettes folder contains an quarto documents that explains the use of the SpectriPy package using examples. This is a good starting point to explore the package and its functionality.
Python setup and configuration
Where are python libraries defined?
SpectriPy uses the R reticulate package for conversion between (basic) R and Python data types.
The reticulate
r_to_py()
andpy_to_r()
functions are used for conversion of basic data types between R and Python and vice versa. To use these functions, an Python environment with the matchms library must be used.
Test data
What data could be used in tests?
The package contains two test data files. The “test” and “spectra2” example data were created manually by defining m/z and intensity values of MS peaks. Data files can be added (e.g. in MGF format) if needed and put into a inst/extdata folder.
Alternatively, example files in mzML format would be available in Bioconductor’s msdata package.
To test the package and newly created functionality: add the respective unit tests to the tests/testthat folder and evaluate them e.g. by running
rcmdcheck::rcmdcheck(args = "--no-manual")
in an R session started within the package folder.
Potential contributions and extensions
What could be implemented?
See the open issues, here are some major topics.
Integrate other Python libraries? More a discussion - see issue #24.
Integrate functionality for spectra processing, downstream analysis (e.g. cleaning), … See also issue #20.
Ability to translate additional data structures. See also issue #18.
Define a use case analysis (or ideally several): show how data can be analyzed with the SpectriPy package using a “quarto” document directly combining the R and Python code: See also issue #21.
Contributing
How to contribute?
Ideally fork the github repository, implement extensions and make a pull request to the main branch.
Follow the coding style guidelines and adhere to the code of conduct.