Mass Spectrometry Data on ExperimentHub

Introduction

The MsDataHub package provides example mass spectrometry data, peptide spectrum matches or quantitative data from proteomics and metabolomics experiments. The data are served through the ExperimentHub infrastructure, which allows download them only ones and cache them for further use. Currently available data are summarised in the table below and details in the next section.

library("MsDataHub")
DT::datatable(MsDataHub())

Installation

To install the package:

if (!require("BiocManager"))
    install.packages("BiocManager")

BiocManager::install("MsDataHub")

Available data

TripleTOF

Type: Raw MS data
Files: PestMix1_DDA.mzML and PestMix1_SWATH.mzML
More details: ?TripleTOF

Load with

f <- PestMix1_DDA.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

library(Spectra)
Spectra(f)

## MSn data (Spectra) with 7602 spectra in a MsBackendMzR backend:
##        msLevel     rtime scanIndex
##      <integer> <numeric> <integer>
## 1            1     0.231         1
## 2            1     0.351         2
## 3            1     0.471         3
## 4            1     0.591         4
## 5            1     0.711         5
## ...        ...       ...       ...
## 7598         1   899.491      7598
## 7599         1   899.613      7599
## 7600         1   899.747      7600
## 7601         1   899.872      7601
## 7602         1   899.993      7602
##  ... 34 more variables/columns.
## 
## file(s):
## 31374e08f187_7861

f <- PestMix1_SWATH.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

Spectra(f)

## MSn data (Spectra) with 8999 spectra in a MsBackendMzR backend:
##        msLevel     rtime scanIndex
##      <integer> <numeric> <integer>
## 1            2     0.203         1
## 2            2     0.300         2
## 3            2     0.397         3
## 4            2     0.494         4
## 5            2     0.591         5
## ...        ...       ...       ...
## 8995         2   899.527      8995
## 8996         2   899.624      8996
## 8997         2   899.721      8997
## 8998         2   899.818      8998
## 8999         2   899.915      8999
##  ... 34 more variables/columns.
## 
## file(s):
## 31372391b261_7862

sciex

Type: Raw MS data
Files: 20171016_POOL_POS_1_105-134.mzML and 20171016_POOL_POS_3_105-134.mzML
More details: ?sciex

Load with

f <- X20171016_POOL_POS_1_105.134.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

Spectra(f)

## MSn data (Spectra) with 931 spectra in a MsBackendMzR backend:
##       msLevel     rtime scanIndex
##     <integer> <numeric> <integer>
## 1           1     0.280         1
## 2           1     0.559         2
## 3           1     0.838         3
## 4           1     1.117         4
## 5           1     1.396         5
## ...       ...       ...       ...
## 927         1   258.641       927
## 928         1   258.920       928
## 929         1   259.199       929
## 930         1   259.478       930
## 931         1   259.757       931
##  ... 34 more variables/columns.
## 
## file(s):
## 31373f2a4cfc_7859

f <- X20171016_POOL_POS_3_105.134.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

Spectra(f)

## MSn data (Spectra) with 931 spectra in a MsBackendMzR backend:
##       msLevel     rtime scanIndex
##     <integer> <numeric> <integer>
## 1           1     0.275         1
## 2           1     0.554         2
## 3           1     0.833         3
## 4           1     1.112         4
## 5           1     1.391         5
## ...       ...       ...       ...
## 927         1   258.636       927
## 928         1   258.915       928
## 929         1   259.194       929
## 930         1   259.473       930
## 931         1   259.752       931
##  ... 34 more variables/columns.
## 
## file(s):
## 31374f3b8793_7860

PXD000001

Type: Raw MS data and peptide spectrum matches
Files: TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz and TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid
More details: ?PDX000001

Load with

f <- TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.20141210.mzML.gz()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

Spectra(f)

## MSn data (Spectra) with 7534 spectra in a MsBackendMzR backend:
##        msLevel     rtime scanIndex
##      <integer> <numeric> <integer>
## 1            1    0.4584         1
## 2            1    0.9725         2
## 3            1    1.8524         3
## 4            1    2.7424         4
## 5            1    3.6124         5
## ...        ...       ...       ...
## 7530         2   3600.47      7530
## 7531         2   3600.83      7531
## 7532         2   3601.18      7532
## 7533         2   3601.57      7533
## 7534         2   3601.98      7534
##  ... 34 more variables/columns.
## 
## file(s):
## 313779ca41fc_7858

f <- TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.20141210.mzid()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

library(PSMatch)
PSM(f)

## PSM with 5802 rows and 35 columns.
## names(35): sequence spectrumID ... subReplacementResidue subLocation

CPTAC

Type: tab-delimited quantitative proteomics data tables (as produced by MaxQuant)
Files: cptac_a_b_c_peptides.txt, cptac_a_b_peptides.txt and cptac_peptides.txt
More details: ?cptac

Load with

library(QFeatures)
f <- cptac_peptides.txt()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

ecols <- grep("Intensity\\.", names(read.delim(f)))
readSummarizedExperiment(f, ecols, sep = "\t")

## class: SummarizedExperiment 
## dim: 11466 45 
## metadata(0):
## assays(1): ''
## rownames(11466): 1 2 ... 11465 11466
## rowData names(143): Sequence N.term.cleavage.window ...
##   Oxidation..M..site.IDs MS.MS.Count
## colnames(45): Intensity.6A_1 Intensity.6A_2 ... Intensity.6E_8
##   Intensity.6E_9
## colData names(0):

cptac_a_b_c_peptides.txt()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

##                                                  EH7804 
## "/github/home/.cache/R/ExperimentHub/31374bf904b6_7854"

cptac_a_b_peptides.txt()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation
## loading from cache

##                                                  EH7805 
## "/github/home/.cache/R/ExperimentHub/31373647d185_7855"

FAAH KO

Type: Raw MS data, in netCDF format.
File: ko15.CDF
More details: ?cdf

Load with

f <- ko15.CDF()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

Spectra(f)

## MSn data (Spectra) with 1278 spectra in a MsBackendMzR backend:
##        msLevel     rtime scanIndex
##      <integer> <numeric> <integer>
## 1            1   2501.38         1
## 2            1   2502.94         2
## 3            1   2504.51         3
## 4            1   2506.07         4
## 5            1   2507.64         5
## ...        ...       ...       ...
## 1274         1   4493.56      1274
## 1275         1   4495.13      1275
## 1276         1   4496.69      1276
## 1277         1   4498.26      1277
## 1278         1   4499.82      1278
##  ... 34 more variables/columns.
## 
## file(s):
## 3137199fbf06_7853

DIA-NN software outputs

Type: tab-delimited DIA quantitative proteomics data tables produced by DIA-NN.
Files:
- Label-free DIA: benchmarkingDIA.tsv
- mTRAQ plexDIA: Report.Derks2022.plexDIA.tsv
More details: ?benchmarkingDIA.tsv and ?Report.Derks2022.plexDIA.tsv

Load with

library(QFeatures)
lfdia <- read.delim(MsDataHub::benchmarkingDIA.tsv())
readQFeaturesFromDIANN(lfdia)

##   |                                                                              |                                                                      |   0%  |                                                                              |===                                                                   |   4%  |                                                                              |======                                                                |   8%  |                                                                              |=========                                                             |  12%  |                                                                              |============                                                          |  17%  |                                                                              |===============                                                       |  21%  |                                                                              |==================                                                    |  25%  |                                                                              |====================                                                  |  29%  |                                                                              |=======================                                               |  33%  |                                                                              |==========================                                            |  38%  |                                                                              |=============================                                         |  42%  |                                                                              |================================                                      |  46%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================                                |  54%  |                                                                              |=========================================                             |  58%  |                                                                              |============================================                          |  62%  |                                                                              |===============================================                       |  67%  |                                                                              |==================================================                    |  71%  |                                                                              |====================================================                  |  75%  |                                                                              |=======================================================               |  79%  |                                                                              |==========================================================            |  83%  |                                                                              |=============================================================         |  88%  |                                                                              |================================================================      |  92%  |                                                                              |===================================================================   |  96%  |                                                                              |======================================================================| 100%

## An instance of class QFeatures (type: bulk) with 24 sets:
## 
##  [1] U:\712006-Proteomics\Issues\Issue 253\DIANN\raw-data\RD139_Overlap_UPS1_0_1fmol_inj1.mzML: SummarizedExperiment with 28980 rows and 1 columns 
##  [2] U:\712006-Proteomics\Issues\Issue 253\DIANN\raw-data\RD139_Overlap_UPS1_0_1fmol_inj2.mzML: SummarizedExperiment with 29495 rows and 1 columns 
##  [3] U:\712006-Proteomics\Issues\Issue 253\DIANN\raw-data\RD139_Overlap_UPS1_0_1fmol_inj3.mzML: SummarizedExperiment with 29210 rows and 1 columns 
##  ...
##  [22] U:\712006-Proteomics\Issues\Issue 253\DIANN\raw-data\RD139_Overlap_UPS1_5fmol_inj1.mzML: SummarizedExperiment with 30941 rows and 1 columns 
##  [23] U:\712006-Proteomics\Issues\Issue 253\DIANN\raw-data\RD139_Overlap_UPS1_5fmol_inj2.mzML: SummarizedExperiment with 30321 rows and 1 columns 
##  [24] U:\712006-Proteomics\Issues\Issue 253\DIANN\raw-data\RD139_Overlap_UPS1_5fmol_inj3.mzML: SummarizedExperiment with 24168 rows and 1 columns

plexdia <- read.delim(MsDataHub::Report.Derks2022.plexDIA.tsv())
readQFeaturesFromDIANN(plexdia, multiplexing = "mTRAQ")

##   |                                                                              |                                                                      |   0%  |                                                                              |=                                                                     |   2%  |                                                                              |===                                                                   |   4%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |======                                                                |   9%  |                                                                              |========                                                              |  11%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  15%  |                                                                              |============                                                          |  17%  |                                                                              |=============                                                         |  19%  |                                                                              |==============                                                        |  20%  |                                                                              |================                                                      |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  26%  |                                                                              |===================                                                   |  28%  |                                                                              |=====================                                                 |  30%  |                                                                              |======================                                                |  31%  |                                                                              |=======================                                               |  33%  |                                                                              |=========================                                             |  35%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  39%  |                                                                              |=============================                                         |  41%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |==================================                                    |  48%  |                                                                              |===================================                                   |  50%  |                                                                              |====================================                                  |  52%  |                                                                              |======================================                                |  54%  |                                                                              |=======================================                               |  56%  |                                                                              |========================================                              |  57%  |                                                                              |=========================================                             |  59%  |                                                                              |===========================================                           |  61%  |                                                                              |============================================                          |  63%  |                                                                              |=============================================                         |  65%  |                                                                              |===============================================                       |  67%  |                                                                              |================================================                      |  69%  |                                                                              |=================================================                     |  70%  |                                                                              |===================================================                   |  72%  |                                                                              |====================================================                  |  74%  |                                                                              |=====================================================                 |  76%  |                                                                              |======================================================                |  78%  |                                                                              |========================================================              |  80%  |                                                                              |=========================================================             |  81%  |                                                                              |==========================================================            |  83%  |                                                                              |============================================================          |  85%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  89%  |                                                                              |================================================================      |  91%  |                                                                              |=================================================================     |  93%  |                                                                              |==================================================================    |  94%  |                                                                              |===================================================================   |  96%  |                                                                              |===================================================================== |  98%  |                                                                              |======================================================================| 100%

## An instance of class QFeatures (type: bulk) with 54 sets:
## 
##  [1] F:\JD\plexDIA\nPOP\wJD1146.raw: SummarizedExperiment with 2635 rows and 3 columns 
##  [2] F:\JD\plexDIA\nPOP\wJD1147.raw: SummarizedExperiment with 3000 rows and 3 columns 
##  [3] F:\JD\plexDIA\nPOP\wJD1148.raw: SummarizedExperiment with 2676 rows and 3 columns 
##  ...
##  [52] F:\JD\plexDIA\nPOP\wJD1203.raw: SummarizedExperiment with 4441 rows and 3 columns 
##  [53] F:\JD\plexDIA\nPOP\wJD1204.raw: SummarizedExperiment with 4416 rows and 3 columns 
##  [54] F:\JD\plexDIA\nPOP\wJD1205.raw: SummarizedExperiment with 4492 rows and 3 columns

DIA-NN single-cell proteomics reports

Type: tab-delimited DIA quantitative proteomics data tables produced by DIA-NN.
Files:
- Single-cell abel-free: Ai2025_aCMs_report.tsv
- Single-cell label-free: Ai2025_iCMs_report.tsv
More details: ?Ai2025.

Proteomics contaminant databases

Type: fasta files, as documented in camprotR’s cRAP databases vignette.
Files:
- crap_gpm.fasta: the common Repository of Adventitious Proteins (cRAP) from the Global Proteome Machine (GPM) organisation.
- crap_ccp.fasta: Cambridge Centre for Proteomics’ own cRAP fasta database.
- crap_maxquant.fasta.gz: MaxQuant’s contaminant database.
More details: ?cRAP.

FTICR-MS direct injection MS data

Example files for direct injection fourier-transform ion cyclotron resonance (FTICR) mass spectrometry data.

Type: raw MS data in mzML file format.
Files: 5 replicates from sample HAM004, 5 replicates from sample HAM005, i.e., 10 mzML files.
More details: ?FTICR.

Example how to load one of the available files:

f <- MsDataHub::HAM004_641fE_14.11.07..Exp1.extracted.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

Spectra(f)

## MSn data (Spectra) with 1 spectra in a MsBackendMzR backend:
##     msLevel     rtime scanIndex
##   <integer> <numeric> <integer>
## 1         1        -1         1
##  ... 34 more variables/columns.
## 
## file(s):
## 3137e25dd52_10386

MRM data file

Example file in mzML format for multiple reaction monitoring (MRM) data. The file does not contain mass spectra, but chromatographic data. The data can be imported and represented with the Chromatograms Bioconductor package.

Type: raw (chromatographic) MS data in mzML file format.
Files:
- MRM-standmix-5.mzML: sample from mouse brain acquired by HILIC ESI-QqQ/MS in Dynamic multiple reaction monitoring mode (MRM). HPLC system was a 1290 Infinity (Agilent Technologies) coupled to ion-Funnel Triple quadrupole 6490 mass spectrometer (Agilent Technologies). This file was contributed by Xavi Domingo-Almenara from the The Scripps Research Institute, San Diego, CA.
More details: ?MRM.

Load with

f <- MsDataHub::MRM.standmix.5.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

CE-MS data

The CE-MS test files consist of two files, "CEMS_10ppm.mzML" and "CEMS_25ppm.mzML". The data contains CE-MS runs of a standard mixture that contains e.g. Lysine (at 10 ppm and 25 ppm, respectively) and the neutral EOF marker Paracetamol (50 ppm). The data was acquired on a 7100 capillary electrophoresis system from Agilent Technologies, coupled to an Agilent 6560 IM-QToF-MS. CE Separation was performed using a 80 cm fused silica capillary with an internal diameter of 50 µm and external diameter of 365 µm. The Background Electrolyte was 10 % acetic acid and separation was performed at +30 kV and a constant pressure of 50 mbar. MS detection was performed in positive ionization mode.

The raw data were then converted to the open-source .mzML format (Proteowizard). To reduce data size, the test data was subset to a retention time range from 400 to 900 seconds and an m/z range from 147.1 to 152.0.

Type: raw MS data in mzML file format.
Files:
- CEMS_10ppm.mzML: sample with Lysine added in 10ppm.
- CEMS_25ppm.mzML: sample with Lysine added in 25ppm.
More details: ?CEMS.

Load with

f <- MsDataHub::CEMS_25ppm.mzML()

## see ?MsDataHub and browseVignettes('MsDataHub') for documentation

## loading from cache

s <- Spectra(f)

TMT MS3 SPS data

Example MS3 SPS TMT data.

MS3TMT10_01022016_32917-33481.mzML.gz is an mzML file containing 565 spectra from a MS3 PSP TMT 10-pex experiment.
MS3TMT11.mzML is an mzML file containing 994 scans from MS3 SPS TMT 11-plex experiment.
fdms3tmt11.rda contains a data.frame with identification data for MS3TMT11.mzML.

Adding data to `MsDataHub`

If you would like additional dataset to MsDataHub, start by opening an issue in the package’s GitHub repository and describe the new data. In particular, provide information about it’s provenance, its use, its format(s) and acknowledge that the data may be shared freely with the community without any restrictions. You may provide an open licence specifying the terms it can be re-used, typically a CC-BY-SA license.
By contribution to the package, you acknowledge that you will comply to the R for Mass Spectrometry project code of conduct.
A maintainer of the package will reply to your issue, confirming that the data can be added.
At this point, if you are familiar with the development of ExperimentHub packages and GitHub pull requests, you may directly send one that adds your data to the package. Make sure (1) add appropriate references in the manual page and (2) to add yourself as a contributor of the package in the DESCRIPTION file.
Alternatively, a maintainer will add the dataset to the package and may require your input to make sure the documentation file is complete.

Session information

## R Under development (unstable) (2026-04-19 r89916)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MsDataHub_1.11.5            QFeatures_1.21.3           
##  [3] MultiAssayExperiment_1.37.4 SummarizedExperiment_1.41.1
##  [5] Biobase_2.71.0              GenomicRanges_1.63.2       
##  [7] Seqinfo_1.1.0               IRanges_2.45.0             
##  [9] MatrixGenerics_1.23.0       matrixStats_1.5.0          
## [11] PSMatch_1.15.3              PTMods_0.99.6              
## [13] Spectra_1.21.7              BiocParallel_1.45.0        
## [15] S4Vectors_0.49.2            BiocGenerics_0.57.1        
## [17] generics_0.1.4              BiocStyle_2.39.0           
## 
## loaded via a namespace (and not attached):
##  [1] DBI_1.3.0               httr2_1.2.2             rlang_1.2.0            
##  [4] magrittr_2.0.5          clue_0.3-68             otel_0.2.0             
##  [7] compiler_4.7.0          RSQLite_2.4.6           png_0.1-9              
## [10] systemfonts_1.3.2       vctrs_0.7.3             reshape2_1.4.5         
## [13] stringr_1.6.0           ProtGenerics_1.43.0     crayon_1.5.3           
## [16] pkgconfig_2.0.3         MetaboCoreUtils_1.19.3  fastmap_1.2.0          
## [19] dbplyr_2.5.2            XVector_0.51.0          rmarkdown_2.31         
## [22] ragg_1.5.2              purrr_1.2.2             bit_4.6.0              
## [25] xfun_0.57               cachem_1.1.0            jsonlite_2.0.0         
## [28] blob_1.3.0              DelayedArray_0.37.1     parallel_4.7.0         
## [31] cluster_2.1.8.2         R6_2.6.1                bslib_0.10.0           
## [34] stringi_1.8.7           jquerylib_0.1.4         Rcpp_1.1.1-1           
## [37] bookdown_0.46           knitr_1.51              Matrix_1.7-5           
## [40] igraph_2.3.0            tidyselect_1.2.1        abind_1.4-8            
## [43] yaml_2.3.12             codetools_0.2-20        curl_7.1.0             
## [46] lattice_0.22-9          tibble_3.3.1            plyr_1.8.9             
## [49] withr_3.0.2             KEGGREST_1.51.1         evaluate_1.0.5         
## [52] desc_1.4.3              BiocFileCache_3.1.0     ExperimentHub_3.1.0    
## [55] Biostrings_2.79.5       pillar_1.11.1           BiocManager_1.30.27    
## [58] filelock_1.0.3          DT_0.34.0               ncdf4_1.24             
## [61] BiocVersion_3.23.1      glue_1.8.1              lazyeval_0.2.3         
## [64] tools_4.7.0             AnnotationHub_4.1.0     data.table_1.18.2.1    
## [67] mzR_2.45.1              fs_2.1.0                grid_4.7.0             
## [70] tidyr_1.3.2             crosstalk_1.2.2         MsCoreUtils_1.23.9     
## [73] AnnotationDbi_1.73.1    cli_3.6.6               rappdirs_0.3.4         
## [76] textshaping_1.0.5       S4Arrays_1.11.1         dplyr_1.2.1            
## [79] AnnotationFilter_1.35.0 sass_0.4.10             digest_0.6.39          
## [82] SparseArray_1.11.13     htmlwidgets_1.6.4       memoise_2.0.1          
## [85] htmltools_0.5.9         pkgdown_2.2.0.9000      lifecycle_1.0.5        
## [88] httr_1.4.8              bit64_4.8.0             MASS_7.3-65

Laurent Gatto

Introduction

Installation

Available data

TripleTOF

sciex

PXD000001

CPTAC

FAAH KO

DIA-NN software outputs

DIA-NN single-cell proteomics reports

Proteomics contaminant databases

FTICR-MS direct injection MS data

MRM data file

CE-MS data

TMT MS3 SPS data

Adding data to MsDataHub

Session information

Adding data to `MsDataHub`