Advanced Feature Annotation using RuSirius

Note: this vignette is pre-computed. See the session info for information on packages used and the date the vignette was rendered. The vignette requires a running Sirius instance and a Sirius account for structure database searches. To reproduce this analysis, you will need to log in to your own Sirius account (see the Prerequisites section below). This vignette was pre-rendered using the authors’ Sirius account; users must provide their own credentials.

Introduction

In the main end-to-end LC-MS/MS untargeted metabolomics workflow, we successfully preprocessed our data, performed statistical analysis, and identified features with significant differences in abundance. However, matching the MS1 and MS2 spectra directly against the MassBank database only confidently annotated a single feature (caffeine).

As noted previously, this low proportion of annotated signals is common, and external software tools such as SIRIUS can be excellent alternatives for structure elucidation. SIRIUS uses sophisticated algorithms to predict molecular formulas and structures de novo from isotope patterns and fragmentation trees.

This vignette will demonstrate how to seamlessly continue your metabonaut analysis by passing your unannotated significant features into Sirius using the RuSirius R package.

Prerequisites

To run this workflow interactively, you must have:

Sirius 6.3 installed and running on your system.
The RuSirius package installed and loaded.
A Sirius account (free registration at bright-giant.com) — required for structure database searches.

library(RuSirius)
library(Spectra)

Loading Previous Data

In the end-to-end workflow, we isolated the MS2 spectra for all significant features, concatenated them into a single Spectra object, and saved them to disk. We will load this exact object to begin our advanced annotation.

#' Load the Spectra object containing MS2 scans of significant features
load(system.file("extdata", "spectra_significant_fts.RData",
                 package = "Metabonaut"))

#' Verify the loaded object
ms2_ctr_fts
#> MSn data (Spectra) with 315 spectra in a MsBackendMemory backend:
#>       msLevel     rtime scanIndex
#>     <integer> <numeric> <integer>
#> 1           2   147.357      2043
#> 2           2   148.587      2061
#> 3           2   149.817      2079
#> 4           2   152.297      2115
#> 5           2   147.376      2041
#> ...       ...       ...       ...
#> 311         2   178.082      2481
#> 312         2   179.322      2499
#> 313         2   180.572      2517
#> 314         2   181.822      2535
#> 315         2   183.072      2553
#>  ... 39 more variables/columns.
#> Processing:
#>  Filter: select retention time [10..240] on MS level(s) 1 2 [Tue Mar 18 11:56:42 2025]
#>  Filter: select MS level(s) 2 [Tue Mar 18 11:56:50 2025]
#>  Remove peaks based on their intensities and a user-provided function in spectra of MS level(s) 2. [Tue Mar 18 11:56:50 2025]
#>  ...19 more processings. Use 'processingLog' to list all.

Each spectrum in this object already contains a feature_id variable that links it back to our main XcmsExperiment results. Note that multiple MS2 spectra can be available per feature — we deliberately send all of them to SIRIUS, which will merge and use them to improve annotation quality.

Connecting to Sirius and Importing Data

We start by establishing a connection to the Sirius application using the Sirius() function. This function will either connect to a running Sirius instance, or start a new one and connect to that. Parameter port used below allows to configure to Sirius application to use a particular port, but generally the function can be used without specifying a port.

We will create a new project dedicated to annotating these specific features.

#' Initialize the Sirius connection and create a new project
srs <- Sirius(projectId = "metabonaut_significant_features",
              path = getwd(), port = 9999)
#> Found SIRIUS in PATH! Using this information to start the application.
#> SIRIUS was started without specifying --port (-p), trying to find the sirius.port file.

#' Check connection status
checkConnection(srs)
#> [1] TRUE

If you have not already logged in during the Sirius() call (by passing username and password arguments), you must log in before running any structure database searches:

#' Log in to your Sirius account
srs <- logIn(srs,
             username = "your_email@example.com",
             password = "your_password")

Next, we import our Spectra object into the Sirius project. We map the ms_column_name parameter to "feature_id" so that Sirius uses our existing feature names.

#' Import the MS2 spectra into Sirius
srs <- import(
    sirius = srs,
    spectra = ms2_ctr_fts,
    ms_column_name = "feature_id",
    deleteExistingFeatures = TRUE
)

#' View summary of imported features
head(featuresInfo(srs))
#>      alignedFeatureId     compoundId           externalFeatureId
#> [1,] "818899412191263562" "818899412153514821" "FT0371"         
#> [2,] "818899412258372427" "818899412153514822" "FT0565"         
#> [3,] "818899412350647116" "818899412153514823" "FT0732"         
#> [4,] "818899412380007245" "818899412153514824" "FT0845"         
#> [5,] "818899412400978766" "818899412153514825" "FT1171"         
#>      ionMass  charge detectedAdducts hasMs1 hasMsMs computing
#> [1,] 138.0548 1      list,1          FALSE  TRUE    FALSE    
#> [2,] 161.0401 1      list,1          FALSE  TRUE    FALSE    
#> [3,] 182.0748 1      list,1          FALSE  TRUE    FALSE    
#> [4,] 195.0877 1      list,1          FALSE  TRUE    FALSE    
#> [5,] 229.1299 1      list,1          FALSE  TRUE    FALSE

Running Sirius Computations

With the data imported, we can submit a job to Sirius. We will ask Sirius to:

Identify candidate molecular formulas (formulaIdParams).
Predict the compound class (predictParams).
Search structure databases for candidate structures (structureDbSearchParams).

Because our samples are human serum/plasma analyzed in positive polarity, we expect protonated ions ([M+H]+), sodium adducts ([M+Na]+), and ammonium adducts ([M+H-NH3]+). We supply these as our fallback adducts.

#' Submit the annotation job
job_id <- run(
    srs,
    fallbackAdducts = c("[M+H]+", "[M+Na]+", "[M+H-NH3]+"),
    formulaIdParams = formulaIdParam(
        numberOfCandidates = 5,
        instrument = "QTOF",
        massAccuracyMS2ppm = 10
    ),
    predictParams = predictParam(),
    structureDbSearchParams = structureDbSearchParam(
        structureSearchDbs = c("BIO")
    ),
    recompute = TRUE,
    wait = TRUE
)

#' Optional: Print job info if you want to verify successful completion
jobInfo(srs, job_id)
#> [1] "Job ID: 1\n\nCommand: \n--IsotopeSettings.filter=true\n--InjectSpectralLibraryMatchFormulas.minPeakMatchesToInject=6\n--FormulaSettings.enforced=HCNOP\n--InjectSpectralLibraryMatchFormulas.injectFormulas=true\n--TagStructuresByElGordo=true\n--AdductSettings.detectable=[M+H3N+H]+,[M-H4O2+H]+,[M-H2O-H]-,[M-H3N-H]-,[M+Cl]-,[2M+K]+,[M+K]+,[2M+Cl]-,[M+C2H4O2-H]-,[M+H]+,[2M+H]+,[M-CH3-H]-,[M-H]-,[M+Na]+,[M-H2O+H]+\n--RecomputeResults=true\n--UseHeuristic.useHeuristicAboveMz=300\n--IsotopeMs2Settings=IGNORE\n--MS2MassDeviation.allowedMassDeviation=10.0ppm\n--FormulaSearchSettings.applyFormulaConstraintsToDatabaseCandidates=false\n--EnforceElGordoFormula=true\n--NumberOfCandidatesPerIonization=1\n--AdductSettings.fallback=[M+H]+,[M+Na]+,[M+H-NH3]+\n--FormulaSearchSettings.performBottomUpAboveMz=0.0\n--FormulaSettings.fallback=S\n--FormulaSearchSettings.applyFormulaConstraintsToBottomUp=false\n--UseHeuristic.useOnlyHeuristicAboveMz=650\n--ExpansiveSearchConfidenceMode.confidenceScoreSimilarityMode=APPROXIMATE\n--InjectSpectralLibraryMatchFormulas.minScoreToInject=0.7\n--FormulaSearchDB=\n--FormulaResultThreshold=true\n--InjectSpectralLibraryMatchFormulas.alwaysPredict=false\n--FormulaSettings.detectable=B,S,Cl,Se,Br\n--NumberOfCandidates=5\nformulas\nfingerprints\nclasses\nstructures\n\nProgress:\n   State: DONE\n   Current Progress: 2650\n   Max Progress: 2650\n\nAffected Compound IDs:\n   818899412153514825, 818899412153514824, 818899412153514823, 818899412153514822, 818899412153514821\n\nAffected Aligned Feature IDs:\n818899412400978766\n818899412380007245\n818899412350647116\n818899412258372427\n818899412191263562\n"

Tip: If you want to explore the data visually while or after it processes, you can use openGUI(srs) to launch the Sirius graphical interface!

Retrieving and Interpreting Results

Once the computation is complete, we can extract the structural and formula predictions back into R for downstream analysis.

High-Level Summary

The summary() function provides a compact overview of the top formulas, structures, and compound classes predicted for each feature. This includes confidence scores that indicate how reliable each annotation is.

summary_results <- summary(sirius = srs, result.type = "structure")
summary_results
#>     alignedFeatureId         compoundId externalFeatureId
#> 1 818899412191263562 818899412153514821            FT0371
#> 2 818899412258372427 818899412153514822            FT0565
#> 3 818899412350647116 818899412153514823            FT0732
#> 4 818899412380007245 818899412153514824            FT0845
#> 5 818899412400978766 818899412153514825            FT1171
#>    ionMass charge hasMs1 hasMsMs          formulaId
#> 1 138.0548      1  FALSE    TRUE 818899436262374254
#> 2 161.0401      1  FALSE    TRUE 818899436237208404
#> 3 182.0748      1  FALSE    TRUE 818899437071874932
#> 4 195.0877      1  FALSE    TRUE 818899436253985629
#> 5 229.1299      1  FALSE    TRUE 818899436258179943
#>   molecularFormula    adduct rank siriusScoreNormalized
#> 1          C7H7NO2  [M + H]+    3            0.04957077
#> 2        C4H5BFNO4  [M + H]+    1            0.50000000
#> 3       C5H9F2N3O2  [M + H]+    1            0.49968135
#> 4        C8H10N4O2  [M + H]+    1            0.48622197
#> 5        C12H18N2O [M + Na]+    2            0.08134845
#>   siriusScore isotopeScore treeScore       inchiKey
#> 1    20.13407            0  20.13407 VOCKNCWQVHJMAE
#> 2    69.24605            0  69.24605           <NA>
#> 3   112.21981            0 112.21981 GWRLHZIGYXCDKL
#> 4    68.84529            0  68.84529 RYYVLZVUVIJVGH
#> 5     5.47353            0   5.47353 HBIDZSUDZACENV
#>                             smiles
#> 1               COC1=NC=CC(=C1)C=O
#> 2                             <NA>
#> 3             C(CF)NC(=O)N(CCF)N=O
#> 4     CN1C=NC2=C1C(=O)N(C(=O)N2C)C
#> 5 CC(=CCCC(=CCC(=O)C=[N+]=[N-])C)C
#>                                   structureName      xlogP
#> 1                   2-Methoxyisonicotinaldehyde  0.9654924
#> 2                                          <NA>         NA
#> 3                   Bis(fluoroethyl)nitrosourea  0.8681960
#> 4                                         Thein -0.1085821
#> 5 (4E)-1-Diazo-5,9-dimethyl-4,8-decadiene-2-one  3.4000000
#>   rank.1   csiScore tanimotoSimilarity mcesDistToTopHit type
#> 1      1  -75.15435          0.2592593                0  NPC
#> 2     NA         NA                 NA               NA  NPC
#> 3      1 -152.63122          0.3125000                0  NPC
#> 4      1  -12.05606          0.9743590                0  NPC
#> 5      1 -125.16005          0.3018868                0  NPC
#>     level levelIndex                     name
#> 1 PATHWAY          0                Alkaloids
#> 2 PATHWAY          0 Amino acids and Peptides
#> 3 PATHWAY          0                Alkaloids
#> 4 PATHWAY          0                Alkaloids
#> 5 PATHWAY          0               Terpenoids
#>                         description id probability index type.1
#> 1                Pathway: Alkaloids  0   0.5230578     0    NPC
#> 2 Pathway: Amino acids and Peptides  1   0.8786789     1    NPC
#> 3                Pathway: Alkaloids  0   0.1652622     0    NPC
#> 4                Pathway: Alkaloids  0   0.9989780     0    NPC
#> 5               Pathway: Terpenoids  6   0.5267499     6    NPC
#>      level.1 levelIndex.1                           name.1
#> 1 SUPERCLASS            1         Nicotinic acid alkaloids
#> 2 SUPERCLASS            1                   Small peptides
#> 3 SUPERCLASS            1                   Small peptides
#> 4 SUPERCLASS            1 Pseudoalkaloids (transamidation)
#> 5 SUPERCLASS            1                 Sesquiterpenoids
#>                                  description.1 id.1
#> 1         Superclass: Nicotinic acid alkaloids   44
#> 2                   Superclass: Small peptides   63
#> 3                   Superclass: Small peptides   63
#> 4 Superclass: Pseudoalkaloids (transamidation)   59
#> 5                 Superclass: Sesquiterpenoids   61
#>   probability.1 index.1 type.2 level.2 levelIndex.2
#> 1     0.6076760      44    NPC   CLASS            2
#> 2     0.8957072      63    NPC   CLASS            2
#> 3     0.1016899      63    NPC   CLASS            2
#> 4     0.9999007      59    NPC   CLASS            2
#> 5     0.1907384      61    NPC   CLASS            2
#>                name.2              description.2 id.2
#> 1  Pyridine alkaloids  Class: Pyridine alkaloids  602
#> 2          Aminoacids          Class: Aminoacids  109
#> 3 Imidazole alkaloids Class: Imidazole alkaloids  399
#> 4    Purine alkaloids    Class: Purine alkaloids  597
#> 5 Terpenoid alkaloids Class: Terpenoid alkaloids  682
#>   probability.2 index.2 confidenceExactMatch
#> 1    0.60770983     602           0.06000814
#> 2    0.93744338     109                   NA
#> 3    0.08924162     399           0.02842510
#> 4    0.99996340     597           0.67976562
#> 5    0.15184069     682           0.03929589
#>   confidenceApproxMatch expansiveSearchState computing
#> 1            0.06000814                  OFF     FALSE
#> 2                    NA                 <NA>     FALSE
#> 3            0.02842510                  OFF     FALSE
#> 4            0.89135261                  OFF     FALSE
#> 5            0.03929589          APPROXIMATE     FALSE

The summary() output contains the top-ranked molecular formula, predicted structure (with SMILES, InChIKey, and CSI:FingerID score), compound class predictions (NPC pathway, superclass, class), and — critically — the confidence scores (confidenceExactMatch and confidenceApproxMatch). These confidence scores are the most important indicator for evaluating the reliability of the prediction.

You can seamlessly merge these results back into your rowData(res) from the main workflow using the feature identifiers to complement your differential abundance statistics with structural predictions.

For more detailed results (e.g., multiple structure candidates per formula, compound class predictions, fragmentation trees), see the results() function documented in ?results.

De Novo Structure Annotation for Low-Confidence Features

De novo structure annotation using MSNovelist is particularly useful for features where the database search did not yield high-confidence results. MSNovelist generates molecular structures directly from MS/MS data without relying on any database, making it valuable for novel or poorly characterized compounds.

We identify features with low confidence (below 0.5) or no structure match and run de novo annotation on those.

#' Identify features with low confidence or no structure match
fts_denovo <- summary_results$alignedFeatureId[which(
    is.na(summary_results$confidenceApproxMatch) |
        summary_results$confidenceApproxMatch < 0.5)]
fts_denovo
#> [1] "818899412191263562" "818899412258372427"
#> [3] "818899412350647116" "818899412400978766"

We submit a de novo annotation job for these features. It is recommended to also use ZODIAC re-ranking when running MSNovelist.

#' Run de novo structure annotation for low-confidence features
job_id_denovo <- tryCatch(
    run(
        srs,
        formulaIdParams = formulaIdParam(
            numberOfCandidates = 5,
            instrument = "QTOF",
            massAccuracyMS2ppm = 10
        ),
        msNovelistParams = deNovoStructureParam(numberOfCandidateToPredict = 5),
        alignedFeaturesIds = fts_denovo,
        recompute = FALSE,
        wait = TRUE
    ),
    error = function(e) NULL
)

We can now retrieve the de novo results:

#' Get de novo summary
summary_denovo <- summary(srs, result.type = "deNovo")
summary_denovo
#>     alignedFeatureId         compoundId externalFeatureId
#> 1 818899412191263562 818899412153514821            FT0371
#> 2 818899412258372427 818899412153514822            FT0565
#> 3 818899412350647116 818899412153514823            FT0732
#> 4 818899412380007245 818899412153514824            FT0845
#> 5 818899412400978766 818899412153514825            FT1171
#>    ionMass charge hasMs1 hasMsMs          formulaId
#> 1 138.0548      1  FALSE    TRUE 818899436262374254
#> 2 161.0401      1  FALSE    TRUE 818899436237208404
#> 3 182.0748      1  FALSE    TRUE 818899437071874933
#> 4 195.0877      1  FALSE    TRUE 818899436253985629
#> 5 229.1299      1  FALSE    TRUE 818899436258179946
#>   molecularFormula         adduct rank siriusScoreNormalized
#> 1          C7H7NO2       [M + H]+    3           0.049570774
#> 2        C4H5BFNO4       [M + H]+    1           0.500000000
#> 3      C5H12F2N4O2 [M - H3N + H]+    2           0.499681353
#> 4        C8H10N4O2       [M + H]+    1           0.486221973
#> 5         C6H19N6P      [M + Na]+    5           0.007302549
#>   siriusScore isotopeScore  treeScore       inchiKey
#> 1   20.134072            0  20.134072 WDWLOWADVFWUQE
#> 2   69.246054            0  69.246054           <NA>
#> 3  112.219809            0 112.219809 PTOGTCVHRNJTSH
#> 4   68.845292            0  68.845292 RYYVLZVUVIJVGH
#> 5    3.063011            0   3.063011 LHYCAMVLJXLUBB
#>                         smiles      xlogP rank.1  csiScore
#> 1             C=Cc1ncoc1C(C)=O  0.0000000      1 -54.11018
#> 2                         <NA>         NA      1        NA
#> 3           NCCNOC(=O)NNCC(F)F  0.0000000      1 -70.82645
#> 4 CN1C=NC2=C1C(=O)N(C(=O)N2C)C -0.1085821      1 -12.05606
#> 5      CCN(CC)P(N)(N)=NN=C(C)N  0.0000000      1 -61.81776
#>   tanimotoSimilarity type   level levelIndex
#> 1          0.5641026  NPC PATHWAY          0
#> 2                 NA  NPC PATHWAY          0
#> 3          0.5094340  NPC PATHWAY          0
#> 4          0.9743590  NPC PATHWAY          0
#> 5          0.2790698  NPC PATHWAY          0
#>                       name                       description id
#> 1                Alkaloids                Pathway: Alkaloids  0
#> 2 Amino acids and Peptides Pathway: Amino acids and Peptides  1
#> 3                Alkaloids                Pathway: Alkaloids  0
#> 4                Alkaloids                Pathway: Alkaloids  0
#> 5                Alkaloids                Pathway: Alkaloids  0
#>   probability index type.1    level.1 levelIndex.1
#> 1   0.5230578     0    NPC SUPERCLASS            1
#> 2   0.8786789     1    NPC SUPERCLASS            1
#> 3   0.2048894     0    NPC SUPERCLASS            1
#> 4   0.9989780     0    NPC SUPERCLASS            1
#> 5   0.8715882     0    NPC SUPERCLASS            1
#>                             name.1
#> 1         Nicotinic acid alkaloids
#> 2                   Small peptides
#> 3                   Small peptides
#> 4 Pseudoalkaloids (transamidation)
#> 5              Histidine alkaloids
#>                                  description.1 id.1
#> 1         Superclass: Nicotinic acid alkaloids   44
#> 2                   Superclass: Small peptides   63
#> 3                   Superclass: Small peptides   63
#> 4 Superclass: Pseudoalkaloids (transamidation)   59
#> 5              Superclass: Histidine alkaloids   34
#>   probability.1 index.1 type.2 level.2 levelIndex.2
#> 1    0.60767603      44    NPC   CLASS            2
#> 2    0.89570719      63    NPC   CLASS            2
#> 3    0.34776333      63    NPC   CLASS            2
#> 4    0.99990070      59    NPC   CLASS            2
#> 5    0.08852576      34    NPC   CLASS            2
#>               name.2             description.2 id.2
#> 1 Pyridine alkaloids Class: Pyridine alkaloids  602
#> 2         Aminoacids         Class: Aminoacids  109
#> 3         Aminoacids         Class: Aminoacids  109
#> 4   Purine alkaloids   Class: Purine alkaloids  597
#> 5         Polyamines         Class: Polyamines  571
#>   probability.2 index.2        formulaId.1 molecularFormula.1
#> 1     0.6077098     602 818899436262374254            C7H7NO2
#> 2     0.9374434     109 818899436237208404          C4H5BFNO4
#> 3     0.1858719     109 818899437071874933        C5H12F2N4O2
#> 4     0.9999634     597 818899436253985629          C8H10N4O2
#> 5     0.1634796     571 818899436258179946           C6H19N6P
#>         adduct.1 rank.2 siriusScoreNormalized.1 siriusScore.1
#> 1       [M + H]+      3             0.049570774     20.134072
#> 2       [M + H]+     NA             0.500000000     69.246054
#> 3 [M - H3N + H]+      2             0.499681353    112.219809
#> 4       [M + H]+      1             0.486221973     68.845292
#> 5      [M + Na]+      5             0.007302549      3.063011
#>   isotopeScore.1 treeScore.1     inchiKey.1
#> 1              0   20.134072 WDWLOWADVFWUQE
#> 2              0   69.246054           <NA>
#> 3              0  112.219809 PTOGTCVHRNJTSH
#> 4              0   68.845292 RYYVLZVUVIJVGH
#> 5              0    3.063011 LHYCAMVLJXLUBB
#>                       smiles.1    xlogP.1 rank.3 csiScore.1
#> 1             C=Cc1ncoc1C(C)=O  0.0000000      1  -54.11018
#> 2                         <NA>         NA     NA         NA
#> 3           NCCNOC(=O)NNCC(F)F  0.0000000      1  -70.82645
#> 4 CN1C=NC2=C1C(=O)N(C(=O)N2C)C -0.1085821      1  -12.05606
#> 5      CCN(CC)P(N)(N)=NN=C(C)N  0.0000000      1  -61.81776
#>   tanimotoSimilarity.1 type.3 level.3 levelIndex.3
#> 1            0.5641026    NPC PATHWAY            0
#> 2                   NA    NPC PATHWAY            0
#> 3            0.5094340    NPC PATHWAY            0
#> 4            0.9743590    NPC PATHWAY            0
#> 5            0.2790698    NPC PATHWAY            0
#>                     name.3                     description.3
#> 1                Alkaloids                Pathway: Alkaloids
#> 2 Amino acids and Peptides Pathway: Amino acids and Peptides
#> 3                Alkaloids                Pathway: Alkaloids
#> 4                Alkaloids                Pathway: Alkaloids
#> 5                Alkaloids                Pathway: Alkaloids
#>   id.3 probability.3 index.3 type.4    level.4 levelIndex.4
#> 1    0     0.5230578       0    NPC SUPERCLASS            1
#> 2    1     0.8786789       1    NPC SUPERCLASS            1
#> 3    0     0.2048894       0    NPC SUPERCLASS            1
#> 4    0     0.9989780       0    NPC SUPERCLASS            1
#> 5    0     0.8715882       0    NPC SUPERCLASS            1
#>                             name.4
#> 1         Nicotinic acid alkaloids
#> 2                   Small peptides
#> 3                   Small peptides
#> 4 Pseudoalkaloids (transamidation)
#> 5              Histidine alkaloids
#>                                  description.4 id.4
#> 1         Superclass: Nicotinic acid alkaloids   44
#> 2                   Superclass: Small peptides   63
#> 3                   Superclass: Small peptides   63
#> 4 Superclass: Pseudoalkaloids (transamidation)   59
#> 5              Superclass: Histidine alkaloids   34
#>   probability.4 index.4 type.5 level.5 levelIndex.5
#> 1    0.60767603      44    NPC   CLASS            2
#> 2    0.89570719      63    NPC   CLASS            2
#> 3    0.34776333      63    NPC   CLASS            2
#> 4    0.99990070      59    NPC   CLASS            2
#> 5    0.08852576      34    NPC   CLASS            2
#>               name.5             description.5 id.5
#> 1 Pyridine alkaloids Class: Pyridine alkaloids  602
#> 2         Aminoacids         Class: Aminoacids  109
#> 3         Aminoacids         Class: Aminoacids  109
#> 4   Purine alkaloids   Class: Purine alkaloids  597
#> 5         Polyamines         Class: Polyamines  571
#>   probability.5 index.5 computing structureName structureName.1
#> 1     0.6077098     602     FALSE          <NA>            <NA>
#> 2     0.9374434     109     FALSE          <NA>            <NA>
#> 3     0.1858719     109     FALSE          <NA>            <NA>
#> 4     0.9999634     597     FALSE         Thein           Thein
#> 5     0.1634796     571     FALSE          <NA>            <NA>

Clean Up

Once you have saved your results to your R environment, it is good practice to cleanly shut down the Sirius connection.

#' Close the project and shut down the Sirius connection
shutdown(srs)
#> Sirius was shut down successfully

Summary

By integrating RuSirius into the metabonaut workflow, we transitioned from unresolved MS2 spectra to structural predictions using SIRIUS’s formula identification, CSI:FingerID structure database search, and MSNovelist de novo structure generation. Importantly, we pass all available MS2 spectra per feature to SIRIUS (not just a single consensus spectrum), allowing it to leverage multiple fragmentation patterns for improved annotation. For features where database searches yield low-confidence results, de novo structure generation provides an alternative path to structural elucidation. This bridges the gap between raw statistical feature discovery and biological interpretation.

References and Acknowledgements

This vignette relies on the SIRIUS software suite developed by the Böcker Lab at Friedrich-Schiller-Universität Jena and Bright Giant GmbH. SIRIUS integrates several algorithms for metabolite annotation from high-resolution mass spectrometry data. When using SIRIUS and the tools accessed through this workflow, please cite the following references:

SIRIUS — fragmentation tree computation and molecular formula identification (Dührkop et al. 2019).
CSI:FingerID — molecular structure database search (Dührkop et al. 2015).
COSMIC — confidence scoring for structural annotations (Hoffmann et al. 2022).
CANOPUS — compound class prediction from fragmentation spectra (Dührkop et al. 2021).
ZODIAC — molecular formula re-ranking using Gibbs sampling (Ludwig et al. 2020).
MSNovelist — de novo structure generation from mass spectra (Stravs et al. 2022).

The R interface to SIRIUS is provided by the RuSirius package (Louail et al.), which is built upon the RSirius REST API library. Special thanks to Markus Fleischauer for his work on the Sirius SDKs, Jonas Emmert for making the R API usable, and Marcus Ludwig for support in implementing RuSirius.

Session information

The R code was run on:

date()
#> [1] "Mon Mar  9 18:34:51 2026"

Information on the R session:

sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26200)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.utf8 
#> [2] LC_CTYPE=English_United Kingdom.utf8   
#> [3] LC_MONETARY=English_United Kingdom.utf8
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets 
#> [7] methods   base     
#> 
#> other attached packages:
#> [1] RuSirius_0.2.5      jsonlite_2.0.0      Spectra_1.20.1     
#> [4] BiocParallel_1.44.0 S4Vectors_0.48.0    BiocGenerics_0.56.0
#> [7] generics_0.1.4      RSirius_6.3.3       ProtGenerics_1.42.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.2.3                   bitops_1.0-9               
#>   [3] MetaboAnnotation_1.14.0     gridExtra_2.3              
#>   [5] httr2_1.2.2                 remotes_2.5.0              
#>   [7] rlang_1.1.7                 magrittr_2.0.4             
#>   [9] clue_0.3-67                 otel_0.2.0                 
#>  [11] matrixStats_1.5.0           compiler_4.5.2             
#>  [13] RSQLite_2.4.6               png_0.1-8                  
#>  [15] callr_3.7.6                 vctrs_0.7.1                
#>  [17] reshape2_1.4.5              stringr_1.6.0              
#>  [19] crayon_1.5.3                pkgconfig_2.0.3            
#>  [21] MetaboCoreUtils_1.19.2      fastmap_1.2.0              
#>  [23] dbplyr_2.5.2                XVector_0.50.0             
#>  [25] ps_1.9.1                    purrr_1.2.1                
#>  [27] bit_4.6.0                   xfun_0.56                  
#>  [29] MultiAssayExperiment_1.36.1 cachem_1.1.0               
#>  [31] ChemmineR_3.62.0            blob_1.3.0                 
#>  [33] DelayedArray_0.36.0         parallel_4.5.2             
#>  [35] cluster_2.1.8.1             R6_2.6.1                   
#>  [37] stringi_1.8.7               RColorBrewer_1.1-3         
#>  [39] GenomicRanges_1.62.1        Rcpp_1.1.1                 
#>  [41] Seqinfo_1.0.0               SummarizedExperiment_1.40.0
#>  [43] knitr_1.51                  base64enc_0.1-6            
#>  [45] IRanges_2.44.0              BiocBaseUtils_1.12.0       
#>  [47] Matrix_1.7-4                igraph_2.2.2               
#>  [49] tidyselect_1.2.1            abind_1.4-8                
#>  [51] yaml_2.3.12                 codetools_0.2-20           
#>  [53] curl_7.0.0                  processx_3.8.6             
#>  [55] pkgbuild_1.4.8              lattice_0.22-7             
#>  [57] tibble_3.3.1                plyr_1.8.9                 
#>  [59] Biobase_2.70.0              KEGGREST_1.50.0            
#>  [61] S7_0.2.1                    evaluate_1.0.5             
#>  [63] desc_1.4.3                  BiocFileCache_3.0.0        
#>  [65] xml2_1.5.2                  Biostrings_2.78.0          
#>  [67] pillar_1.11.1               BiocManager_1.30.27        
#>  [69] filelock_1.0.3              MatrixGenerics_1.22.0      
#>  [71] DT_0.34.0                   RCurl_1.98-1.17            
#>  [73] BiocVersion_3.22.0          ggplot2_4.0.2              
#>  [75] scales_1.4.0                glue_1.8.0                 
#>  [77] lazyeval_0.2.2              tools_4.5.2                
#>  [79] AnnotationHub_4.0.0         QFeatures_1.20.0           
#>  [81] fs_1.6.6                    grid_4.5.2                 
#>  [83] tidyr_1.3.2                 MsCoreUtils_1.22.1         
#>  [85] AnnotationDbi_1.72.0        cli_3.6.5                  
#>  [87] rappdirs_0.3.4              rsvg_2.7.0                 
#>  [89] S4Arrays_1.10.1             dplyr_1.2.0                
#>  [91] AnnotationFilter_1.34.0     gtable_0.3.6               
#>  [93] digest_0.6.39               SparseArray_1.10.8         
#>  [95] rjson_0.2.23                htmlwidgets_1.6.4          
#>  [97] farver_2.1.2                memoise_2.0.1              
#>  [99] htmltools_0.5.9             lifecycle_1.0.5            
#> [101] httr_1.4.8                  CompoundDb_1.14.2          
#> [103] bit64_4.6.0-1               MASS_7.3-65

Dührkop, Kai, Markus Fleischauer, Marcus Ludwig, et al. 2019. “SIRIUS 4: A Rapid Tool for Turning Tandem Mass Spectra into Metabolite Structure Information.” Nature Methods 16: 299–302. https://doi.org/10.1038/s41592-019-0344-8.

Dührkop, Kai, Louis-Félix Nothias, Markus Fleischauer, et al. 2021. “Systematic Classification of Unknown Metabolites Using High-Resolution Fragmentation Mass Spectra.” Nature Biotechnology 39: 462–71. https://doi.org/10.1038/s41587-020-0740-8.

Dührkop, Kai, Huibin Shen, Marvin Meusel, Juho Rousu, and Sebastian Böcker. 2015. “Searching Molecular Structure Databases with Tandem Mass Spectra Using CSI:FingerID.” Proceedings of the National Academy of Sciences 112 (41): 12580–85. https://doi.org/10.1073/pnas.1509788112.

Hoffmann, Martin A., Louis-Félix Nothias, Marcus Ludwig, et al. 2022. “High-Confidence Structural Annotation of Metabolites Absent from Spectral Libraries.” Nature Biotechnology 40: 411–21. https://doi.org/10.1038/s41587-021-01045-9.

Ludwig, Marcus, Louis-Félix Nothias, Kai Dührkop, et al. 2020. “Database-Independent Molecular Formula Annotation Using Gibbs Sampling Through ZODIAC.” Nature Machine Intelligence 2: 629–41. https://doi.org/10.1038/s42256-020-00234-6.

Stravs, Michael A., Kai Dührkop, Sebastian Böcker, and Nicola Zamboni. 2022. “MSNovelist: De Novo Structure Generation from Mass Spectra.” Nature Methods 19: 865–70. https://doi.org/10.1038/s41592-022-01486-3.

Metabonaut Developers