Introduction
Note: this vignette is pre-computed. See the session info for information on packages used and the date the vignette was rendered. The vignette requires a running Sirius instance. To reproduce this analysis, you will need Sirius 6.3 installed and running.
This vignette demonstrates a basic workflow for importing MS data in a Spectra object object into Sirius. It then runs Sirius’s main tools: formula identification, structure database search, compound class prediction, spectral library matching, de novo structure prediction, and finally retrieves the results.
This is a foundational example and does not cover all the possible
parameters for each Sirius tool. For detailed parameter information,
consult the run() function documentation. More information
can be found in the Sirius
documentation online.
IMPORTANT: This is a work in progress. Feedback is highly valued, especially regarding enhancements or additions that could simplify your workflow. Your input as a user is essential.
Prepping Spectra object
Below we load the example mass spectrometry (MS) data, provided by
the MsDataHub, as a Spectra object:
dda_file <- MsDataHub::PestMix1_DDA.mzML()
sp <- Spectra(dda_file)
sp <- setBackend(sp, MsBackendMemory())
sp <- filterEmptySpectra(sp)To import the Spectra data into Sirius, it must
be preprocessed. If spectra from multiple MS levels are present, we need
to group them appropriately.
We use the fragmentGroupIndex() function to assign an
index to each spectrum. MS2 spectra that belong to the same MS1 spectrum
will share the same index. See ?fragmentGroupIndex for
details on how these spectra groups are defined.
sp |>
msLevel() |>
table()
#>
#> 1 2
#> 4627 2756
idxs <- fragmentGroupIndex(sp)
sp$Msn_idx <- idxsOpen Sirius and project set up
The Sirius application is initialized via the API, requiring only a
project ID. If the project exists, it is opened; otherwise, a new
project is created. The srs object acts as the connection
to Sirius and holds project details. Properly shut down the connection
with shutdown(srs) after completing your work.
This srs variable is needed for any task that
necessitate to communicate with the application. You can learn more
about this object class by running ?Sirius in the console.
Below I do not precise the path parameter, by default
Sirius will try save your project in the sirius_projects
folder in your user directory. Note that this folder will not
be created automatically. If you want to save it somewhere else you can
specify the path = parameter.
srs <- Sirius(projectId = "test_spectra", path = getwd(), port = 9999)
#> Found SIRIUS in PATH! Using this information to start the application.
#> SIRIUS was started without specifying --port (-p), trying to find the sirius.port file.You could import the entire Spectra object, but for
demonstration purposes, we will use selected examples.
Here, we import two MS1-MS2 pairs and one MS1 spectrum on its own. It’s also possible to import only MS2 spectra.
When importing, the ms_column_name parameter defines
which column contains the index that groups the spectra. Each such group
is considered one feature in Sirius terminology.
sp_subset <- sp[sp$Msn_idx %in% c(421, 707, 895)]
srs <- import(sirius = srs,
spectra = sp_subset,
ms_column_name = "Msn_idx",
deleteExistingFeatures = TRUE)
## See information about the features
featuresInfo(srs)
#> alignedFeatureId
#> [1,] "819207974345125059"
#> [2,] "819207974382873796"
#> [3,] "819207974403845317"
#> compoundId
#> [1,] "819207974319959232"
#> [2,] "819207974319959233"
#> [3,] "819207974319959234"
#> externalFeatureId
#> [1,] "421"
#> [2,] "707"
#> [3,] "895"
#> ionMass charge
#> [1,] 217.1185 1
#> [2,] 445.1181 1
#> [3,] 0 1
#> detectedAdducts hasMs1
#> [1,] list,1 TRUE
#> [2,] list,1 TRUE
#> [3,] list,1 TRUE
#> hasMsMs computing
#> [1,] TRUE FALSE
#> [2,] TRUE FALSE
#> [3,] FALSE FALSESubmit job to Sirius - For structure DB search
Once data is imported, annotation and prediction can begin. The
run() function accepts parameters for each Sirius tool,
such as formula identification, structure database search, and compound
class prediction.
## Start computation
run(srs,
fallbackAdducts = c("[M + H]+", "[M + Na]+"),
formulaIdParams = formulaIdParam(numberOfCandidates = 10,
instrument = "QTOF",
numberOfCandidatesPerIonization = 3,
massAccuracyMS2ppm = 10,
filterByIsotopePattern = FALSE,
isotopeMs2Settings = c("SCORE"),
performDeNovoBelowMz = 600,
minPeaksToInjectSpecLibMatch = 3),
predictParams = predictParam(),
structureDbSearchParams = structureDbSearchParam(
structureSearchDbs = c("BIO")
),
recompute = TRUE,
wait = TRUE
)
#> [1] "1"
## could test featureInfo vs featureId
info <- featuresInfo(srs)Retrieve Results
To get a summary of all results—including top formulas, structures, and compound class predictions—use the following:
summarytb <- summary(sirius = srs, result.type = "structure")This summary table offers a quick overview for checking whether the predictions meet expectations. However, we recommend not relying solely on it for in-depth analysis. Instead, use the more detailed functions provided later in this vignette.
Key columns include confidence scores that help assess result reliability.
De novo structure description
# Compute with zodiac and denovo
run(srs,
msNovelistParams = deNovoStructureParam(numberOfCandidateToPredict = 5),
recompute = FALSE,
wait = TRUE
)
#> [1] "2"
summaryDeNovo <- summary(srs, result.type = "deNovo")Interestingly, for the first feature, the results remain consistent, while for the second—originally having lower confidence—the predictions now differ.
For a visual exploration of results, you can open the Sirius GUI:
shutdown(srs)
#> Sirius was shut down successfully
# openGUI(srs)
# closeGUI(srs)You can look more into retrieving the other results in the
?results documentation. or the other vignette.
Importing MS2-only or MSn-only data
In some workflows, only MS2 (or MS2 and MS3) spectra are available — for example, when working with spectral libraries, MGF files, or data that was acquired without recording MS1 scans.
The SIRIUS API fully supports importing features without MS1 data.
When no MS1 spectra are present and no ms_column_name is
provided, import() automatically groups MSn spectra by
acquisition order: within each file (dataOrigin), a new
group starts whenever a new MS2 precursorMz is encountered,
and any subsequent higher-level scans (MS3+) are assigned to the same
group as their preceding MS2. With
deleteExistingFeatures = TRUE any eventually present
previously imported spectra (features) and their results are
removed.
## Example: importing MS2-only spectra
## Assume sp_ms2 is a Spectra object containing only MS2 (and optionally MS3)
## spectra, with no MS1 data.
sp_ms2 <- filterMsLevel(sp, msLevel = 2L)
sp_ms2 <- sp_ms2[1:10] # Just an example subset of MS2 spectra
## No need for ms_column_name — the function auto-groups by acquisition order.
srs <- import(sirius = srs,
spectra = sp_ms2,
deleteExistingFeatures = TRUE)
featuresInfo(srs)If your MSn spectra already have a grouping column (e.g., from a
feature detection tool), you can still pass it via
ms_column_name as usual.
Session information
The R code was run on:
date()
#> [1] "Tue Mar 10 15:00:44 2026"Information on the R session:
sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8
#> [2] LC_CTYPE=English_United Kingdom.utf8
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United Kingdom.utf8
#>
#> time zone: Europe/Paris
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats
#> [3] graphics grDevices
#> [5] utils datasets
#> [7] methods base
#>
#> other attached packages:
#> [1] MsDataHub_1.10.0
#> [2] dplyr_1.2.0
#> [3] RuSirius_0.2.5
#> [4] jsonlite_2.0.0
#> [5] MetaboAnnotation_1.14.0
#> [6] RSirius_6.3.3
#> [7] xcms_4.8.0
#> [8] MsExperiment_1.12.0
#> [9] ProtGenerics_1.42.0
#> [10] Spectra_1.20.1
#> [11] BiocParallel_1.44.0
#> [12] S4Vectors_0.48.0
#> [13] BiocGenerics_0.56.0
#> [14] generics_0.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3
#> [2] MultiAssayExperiment_1.36.1
#> [3] magrittr_2.0.4
#> [4] farver_2.1.2
#> [5] MALDIquant_1.22.3
#> [6] fs_1.6.6
#> [7] vctrs_0.7.1
#> [8] memoise_2.0.1
#> [9] RCurl_1.98-1.17
#> [10] base64enc_0.1-6
#> [11] htmltools_0.5.9
#> [12] S4Arrays_1.10.1
#> [13] BiocBaseUtils_1.12.0
#> [14] progress_1.2.3
#> [15] curl_7.0.0
#> [16] AnnotationHub_4.0.0
#> [17] SparseArray_1.10.8
#> [18] mzID_1.48.0
#> [19] htmlwidgets_1.6.4
#> [20] plyr_1.8.9
#> [21] httr2_1.2.2
#> [22] impute_1.84.0
#> [23] cachem_1.1.0
#> [24] igraph_2.2.2
#> [25] lifecycle_1.0.5
#> [26] iterators_1.0.14
#> [27] pkgconfig_2.0.3
#> [28] Matrix_1.7-4
#> [29] R6_2.6.1
#> [30] fastmap_1.2.0
#> [31] MatrixGenerics_1.22.0
#> [32] clue_0.3-67
#> [33] digest_0.6.39
#> [34] pcaMethods_2.2.0
#> [35] rsvg_2.7.0
#> [36] ps_1.9.1
#> [37] AnnotationDbi_1.72.0
#> [38] ExperimentHub_3.0.0
#> [39] GenomicRanges_1.62.1
#> [40] RSQLite_2.4.6
#> [41] filelock_1.0.3
#> [42] httr_1.4.8
#> [43] abind_1.4-8
#> [44] compiler_4.5.2
#> [45] withr_3.0.2
#> [46] bit64_4.6.0-1
#> [47] doParallel_1.0.17
#> [48] S7_0.2.1
#> [49] DBI_1.2.3
#> [50] MASS_7.3-65
#> [51] ChemmineR_3.62.0
#> [52] rappdirs_0.3.4
#> [53] DelayedArray_0.36.0
#> [54] rjson_0.2.23
#> [55] mzR_2.44.0
#> [56] tools_4.5.2
#> [57] PSMatch_1.14.0
#> [58] otel_0.2.0
#> [59] CompoundDb_1.14.2
#> [60] glue_1.8.0
#> [61] QFeatures_1.20.0
#> [62] grid_4.5.2
#> [63] cluster_2.1.8.1
#> [64] reshape2_1.4.5
#> [65] snow_0.4-4
#> [66] gtable_0.3.6
#> [67] preprocessCore_1.72.0
#> [68] tidyr_1.3.2
#> [69] data.table_1.18.2.1
#> [70] hms_1.1.4
#> [71] MetaboCoreUtils_1.19.2
#> [72] xml2_1.5.2
#> [73] XVector_0.50.0
#> [74] BiocVersion_3.22.0
#> [75] foreach_1.5.2
#> [76] pillar_1.11.1
#> [77] stringr_1.6.0
#> [78] limma_3.66.0
#> [79] BiocFileCache_3.0.0
#> [80] lattice_0.22-7
#> [81] bit_4.6.0
#> [82] tidyselect_1.2.1
#> [83] Biostrings_2.78.0
#> [84] knitr_1.51
#> [85] gridExtra_2.3
#> [86] IRanges_2.44.0
#> [87] Seqinfo_1.0.0
#> [88] SummarizedExperiment_1.40.0
#> [89] xfun_0.56
#> [90] Biobase_2.70.0
#> [91] statmod_1.5.1
#> [92] MSnbase_2.36.0
#> [93] matrixStats_1.5.0
#> [94] DT_0.34.0
#> [95] stringi_1.8.7
#> [96] yaml_2.3.12
#> [97] lazyeval_0.2.2
#> [98] evaluate_1.0.5
#> [99] codetools_0.2-20
#> [100] MsCoreUtils_1.22.1
#> [101] tibble_3.3.1
#> [102] BiocManager_1.30.27
#> [103] cli_3.6.5
#> [104] affyio_1.80.0
#> [105] processx_3.8.6
#> [106] Rcpp_1.1.1
#> [107] MassSpecWavelet_1.76.0
#> [108] dbplyr_2.5.2
#> [109] png_0.1-8
#> [110] XML_3.99-0.22
#> [111] parallel_4.5.2
#> [112] ggplot2_4.0.2
#> [113] blob_1.3.0
#> [114] prettyunits_1.2.0
#> [115] AnnotationFilter_1.34.0
#> [116] bitops_1.0-9
#> [117] MsFeatures_1.18.0
#> [118] scales_1.4.0
#> [119] affy_1.88.0
#> [120] ncdf4_1.24
#> [121] purrr_1.2.1
#> [122] crayon_1.5.3
#> [123] rlang_1.1.7
#> [124] KEGGREST_1.50.0
#> [125] vsn_3.78.1