Introduction
Note: this vignette is pre-computed. See the session info for information on packages used and the date the vignette was rendered. The vignette requires a running Sirius instance. To reproduce this analysis, you will need Sirius 6.3 installed and running.
Sirius can search against custom databases in addition to the built-in databases (BIO, PubChem, etc.). This is useful when you have:
- A list of suspect compounds specific to your study
- A custom spectral library (e.g., from MassBank)
- Target compounds you want to prioritize in the search
This vignette demonstrates how to create and use custom databases, and shows the impact on structure identification results.
Managing Databases
Database Information
# Get details about a specific database
infoDb(srs, databaseId = "BIO")
#> Error in `infoDb()`:
#> ! The connection to the Sirius instance is not valid.Creating a Custom Database
Custom databases can be created from files containing compound
information. Supported formats include .tsv,
.csv, or .mgf files with structure
information.
From a Compound List (TSV/CSV)
The file should contain columns for compound name, SMILES (or InChI), and optionally the molecular formula.
From a Spectral Library (MGF)
Spectral libraries in MGF format can also be imported. An example MGF file is included in the package:
# Path to example MassBank MGF file
mgf_file <- system.file("vignettes", "MASSBANKEU.mgf", package = "RuSirius")
createDb(srs,
databaseId = "massbank_custom",
files = mgf_file,
location = getwd())
#> Error in `createDb()`:
#> ! The connection to the Sirius instance is not valid.Comparing Results: Default vs Custom Database
Let’s demonstrate how using a custom database affects structure identification.
Setup: Import Sample Data
# Load example data
dda_file <- MsDataHub::PestMix1_DDA.mzML()
sp <- Spectra(dda_file)
sp <- setBackend(sp, MsBackendMemory())
sp <- filterEmptySpectra(sp)
# Group spectra
idxs <- fragmentGroupIndex(sp)
sp$Msn_idx <- idxs
# Create project and import
srs <- Sirius(projectId = "db_comparison", path = getwd(), port = 9999)
#> Error in `Sirius()`:
#> ! unused argument (port = 9999)
sp_subset <- sp[sp$Msn_idx %in% c(421, 707)]
srs <- import(srs, spectra = sp_subset, ms_column_name = "Msn_idx")
#> Error:
#> ! object 'srs' not foundRun with Default Database (BIO)
# Run structure search with BIO database only
run(srs,
formulaIdParams = formulaIdParam(numberOfCandidates = 5),
predictParams = predictParam(),
structureDbSearchParams = structureDbSearchParam(
structureSearchDbs = c("BIO")
),
recompute = TRUE,
wait = TRUE)
#> Error:
#> ! object 'srs' not found
# Get results
results_bio <- summary(srs, result.type = "structure")
#> Error:
#> ! object 'srs' not found
results_bio[, c("alignedFeatureId", "molecularFormula",
"structureName", "confidenceExactMatch")]
#> Error:
#> ! object 'results_bio' not foundRun with Custom Database Added
# Now include custom database in search
run(srs,
formulaIdParams = formulaIdParam(numberOfCandidates = 5),
predictParams = predictParam(),
structureDbSearchParams = structureDbSearchParam(
structureSearchDbs = c("BIO", "massbank_custom")
),
recompute = TRUE,
wait = TRUE)
#> Error:
#> ! object 'srs' not found
# Get results with custom DB
results_custom <- summary(srs, result.type = "structure")
#> Error:
#> ! object 'srs' not found
results_custom[, c("alignedFeatureId", "molecularFormula",
"structureName", "confidenceExactMatch")]
#> Error:
#> ! object 'results_custom' not foundCompare Results
# Compare confidence scores
comparison <- merge(
results_bio[, c("alignedFeatureId", "confidenceExactMatch")],
results_custom[, c("alignedFeatureId", "confidenceExactMatch")],
by = "alignedFeatureId",
suffixes = c("_bio", "_custom")
)
#> Error in `h()`:
#> ! error in evaluating the argument 'x' in selecting a method for function 'merge': object 'results_bio' not found
comparison
#> Error:
#> ! object 'comparison' not foundIncluding relevant custom databases can improve identification confidence when your compounds are well-represented in the custom database.
Best Practices
Targeted databases: Create focused databases with compounds relevant to your study rather than very large generic databases.
Quality over quantity: Ensure your custom database has accurate structure information (SMILES/InChI).
Combine strategically: Use custom databases alongside BIO for best coverage - BIO for general metabolites, custom for your specific targets.
Spectral libraries: When available, spectral libraries (MGF) provide additional matching power through spectral similarity.
Clean Up
shutdown(srs)
#> Warning in value[[3L]](cond): Could not retrieve open projects: object 'srs' not found
#> Warning in doTryCatch(return(expr), name, parentenv, handler): restarting interrupted
#> promise evaluationSession information
The R code was run on:
date()
#> [1] "Mon Mar 23 11:26:54 2026"Information on the R session:
sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#> LAPACK version 3.12.1
#>
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> time zone: Europe/Rome
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MsDataHub_1.10.0 dplyr_1.2.0 RuSirius_0.2.0
#> [4] jsonlite_2.0.0 MetaboAnnotation_1.14.0 RSirius_6.3.3
#> [7] xcms_4.8.0 MsExperiment_1.12.0 ProtGenerics_1.42.0
#> [10] Spectra_1.20.1 BiocParallel_1.44.0 S4Vectors_0.48.0
#> [13] BiocGenerics_0.56.0 generics_0.1.4
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 MultiAssayExperiment_1.36.1 magrittr_2.0.4
#> [4] farver_2.1.2 MALDIquant_1.22.3 fs_1.6.6
#> [7] vctrs_0.7.1 memoise_2.0.1 RCurl_1.98-1.17
#> [10] base64enc_0.1-6 htmltools_0.5.9 S4Arrays_1.10.1
#> [13] BiocBaseUtils_1.12.0 progress_1.2.3 curl_7.0.0
#> [16] AnnotationHub_4.0.0 SparseArray_1.10.8 mzID_1.48.0
#> [19] htmlwidgets_1.6.4 plyr_1.8.9 httr2_1.2.2
#> [22] impute_1.84.0 cachem_1.1.0 igraph_2.2.1
#> [25] lifecycle_1.0.5 iterators_1.0.14 pkgconfig_2.0.3
#> [28] Matrix_1.7-4 R6_2.6.1 fastmap_1.2.0
#> [31] MatrixGenerics_1.22.0 clue_0.3-66 digest_0.6.39
#> [34] pcaMethods_2.2.0 rsvg_2.7.0 AnnotationDbi_1.72.0
#> [37] ExperimentHub_3.0.0 GenomicRanges_1.62.1 RSQLite_2.4.5
#> [40] filelock_1.0.3 httr_1.4.7 abind_1.4-8
#> [43] compiler_4.5.2 withr_3.0.2 bit64_4.6.0-1
#> [46] doParallel_1.0.17 S7_0.2.1 DBI_1.2.3
#> [49] MASS_7.3-65 ChemmineR_3.62.0 rappdirs_0.3.4
#> [52] DelayedArray_0.36.0 rjson_0.2.23 mzR_2.44.0
#> [55] tools_4.5.2 PSMatch_1.14.0 otel_0.2.0
#> [58] CompoundDb_1.14.2 glue_1.8.0 QFeatures_1.20.0
#> [61] grid_4.5.2 cluster_2.1.8.1 reshape2_1.4.5
#> [64] snow_0.4-4 gtable_0.3.6 preprocessCore_1.72.0
#> [67] tidyr_1.3.2 data.table_1.18.2.1 hms_1.1.4
#> [70] MetaboCoreUtils_1.19.2 xml2_1.5.2 XVector_0.50.0
#> [73] BiocVersion_3.22.0 foreach_1.5.2 pillar_1.11.1
#> [76] stringr_1.6.0 limma_3.66.0 BiocFileCache_3.0.0
#> [79] lattice_0.22-7 bit_4.6.0 tidyselect_1.2.1
#> [82] Biostrings_2.78.0 knitr_1.51 gridExtra_2.3
#> [85] IRanges_2.44.0 Seqinfo_1.0.0 SummarizedExperiment_1.40.0
#> [88] xfun_0.56 Biobase_2.70.0 statmod_1.5.1
#> [91] MSnbase_2.36.0 matrixStats_1.5.0 DT_0.34.0
#> [94] stringi_1.8.7 yaml_2.3.12 lazyeval_0.2.2
#> [97] evaluate_1.0.5 codetools_0.2-20 MsCoreUtils_1.22.1
#> [100] tibble_3.3.1 BiocManager_1.30.27 cli_3.6.5
#> [103] affyio_1.80.0 Rcpp_1.1.1 MassSpecWavelet_1.76.0
#> [106] dbplyr_2.5.1 png_0.1-8 XML_3.99-0.20
#> [109] parallel_4.5.2 ggplot2_4.0.2 blob_1.3.0
#> [112] prettyunits_1.2.0 AnnotationFilter_1.34.0 bitops_1.0-9
#> [115] MsFeatures_1.18.0 scales_1.4.0 affy_1.88.0
#> [118] ncdf4_1.24 purrr_1.2.1 crayon_1.5.3
#> [121] rlang_1.1.7 KEGGREST_1.50.0 vsn_3.78.1