The compareSpectriPy()
function allows to calculate spectral similarity
scores using the calculate_scores module
of the python
matchms.similarity package
package.
Selection and configuration of the algorithm can be performed with one of the parameter objects:
CosineGreedyParam
: calculate the cosine similarity score between spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the giventolerance
. The underlying peak assignment problem is here solved in a greedy way. This can perform notably faster, but does occasionally deviate slightly from a fully correct solution (as with theCosineHungarianParam
algorithm). In practice this will rarely affect similarity scores notably, in particular for smaller tolerances. The algorithm can be configured with parameterstolerance
,mzPower
andintensityPower
(see parameter description for more details).CosineHungarianParam
: calculate the cosine similarity score as withCosineGreedyParam
, but using the Hungarian algorithm to find the best matching peaks between the compared spectra. The algorithm can be configured with parameterstolerance
,mzPower
andintensityPower
(see parameter description for more details).ModifiedCosineParam
: The modified cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the giventolerance
, or if their m/z ratios lie within the tolerance once a mass-shift is applied. The mass shift is simply the difference in precursor-m/z between the two spectra.NeutralLossesCosineParam
: The neutral losses cosine score aims at quantifying the similarity between two mass spectra. The score is calculated by finding best possible matches between peaks of two spectra. Two peaks are considered a potential match if their m/z ratios lie within the giventolerance
once a mass-shift is applied. The mass shift is the difference in precursor-m/z between the two spectra.
Usage
CosineGreedyParam(tolerance = 0.1, mzPower = 0, intensityPower = 1)
CosineHungarianParam(tolerance = 0.1, mzPower = 0, intensityPower = 1)
ModifiedCosineParam(tolerance = 0.1, mzPower = 0, intensityPower = 1)
NeutralLossesCosineParam(
tolerance = 0.1,
mzPower = 0,
intensityPower = 1,
ignorePeaksAbovePrecursor = TRUE
)
# S4 method for class 'Spectra,Spectra,CosineGreedyParam'
compareSpectriPy(x, y, param, ...)
# S4 method for class 'Spectra,missing,CosineGreedyParam'
compareSpectriPy(x, y, param, ...)
Arguments
- tolerance
numeric(1)
: tolerated differences in peaks' m/z. Peaks with m/z differences<= tolerance
are considered matching.- mzPower
numeric(1)
: the power to raise m/z to in the cosine function. The default is 0, in which case the peak intensity products will not depend on the m/z ratios.- intensityPower
numeric(1)
: the power to raise intensity to in the cosine function. The default is 1.- ignorePeaksAbovePrecursor
For
NeutralLossesCosineParam()
:logical(1)
: ifTRUE
(the default), peaks with m/z values larger than the precursor m/z are ignored.- x
A
Spectra()
object.- y
A
Spectra()
object to compare against. If missing, spectra similarities are calculated between all spectra inx
.- param
one of parameter classes listed above (such as
CosineGreedyParam
) defining the similarity scoring function in python and its parameters.- ...
ignored.
Value
compareSpectriPy()
returns a numeric
matrix with the scores,
number of rows being equal to length(x)
and number of columns equal to
length(y)
.
See also
compareSpectra()
in the Spectra
package for pure R
implementations of spectra similarity calculations.
Examples
library(Spectra)
#> Loading required package: S4Vectors
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#>
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:stats’:
#>
#> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#>
#> Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#> as.data.frame, basename, cbind, colnames, dirname, do.call,
#> duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#> lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#> pmin.int, rank, rbind, rownames, sapply, saveRDS, setdiff, table,
#> tapply, union, unique, unsplit, which.max, which.min
#>
#> Attaching package: ‘S4Vectors’
#> The following object is masked from ‘package:utils’:
#>
#> findMatches
#> The following objects are masked from ‘package:base’:
#>
#> I, expand.grid, unname
#> Loading required package: BiocParallel
## Create some example Spectra.
DF <- DataFrame(
msLevel = c(2L, 2L, 2L),
name = c("Caffeine", "Caffeine", "1-Methylhistidine"),
precursorMz = c(195.0877, 195.0877, 170.0924)
)
DF$intensity <- list(
c(340.0, 416, 2580, 412),
c(388.0, 3270, 85, 54, 10111),
c(3.407, 47.494, 3.094, 100.0, 13.240))
DF$mz <- list(
c(135.0432, 138.0632, 163.0375, 195.0880),
c(110.0710, 138.0655, 138.1057, 138.1742, 195.0864),
c(109.2, 124.2, 124.5, 170.16, 170.52))
sps <- Spectra(DF)
## Calculate pairwise similarity beween all spectra within sps with
## matchms' CosineGreedy algorithm
## Note: the first compareSpectriPy will take longer because the Python
## environment needs to be set up.
res <- compareSpectriPy(sps, param = CosineGreedyParam())
res
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.1948181 0
#> [2,] 0.1948181 1.0000000 0
#> [3,] 0.0000000 0.0000000 1
## Next we calculate similarities for all spectra against the first one
res <- compareSpectriPy(sps, sps[1], param = CosineGreedyParam())
## Calculate pairwise similarity of all spectra in sps with matchms'
## ModifiedCosine algorithm
res <- compareSpectriPy(sps, param = ModifiedCosineParam())
res
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.1948181 0.1384183
#> [2,] 0.1948181 1.0000000 0.8520549
#> [3,] 0.1384183 0.8520549 1.0000000
## Note that the ModifiedCosine method requires the precursor m/z to be
## known for all input spectra. Thus, it is advisable to remove spectra
## without precursor m/z before using this algorithm.
sps <- sps[!is.na(precursorMz(sps))]
compareSpectriPy(sps, param = ModifiedCosineParam())
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.1948181 0.1384183
#> [2,] 0.1948181 1.0000000 0.8520549
#> [3,] 0.1384183 0.8520549 1.0000000