The MsBackendMgf
class supports import and export of MS/MS spectra data
from/to files in Mascot Generic Format
(mgf)
files. After initial import, the full MS data is kept in
memory. MsBackendMgf
extends the MsBackendDataFrame()
backend
directly and supports thus the applyProcessing()
function to make
data manipulations persistent.
New objects are created with the MsBackendMgf
function. The
backendInitialize
method has to be subsequently called to
initialize the object and import MS/MS data from (one or more) mgf
files.
The MsBackendMgf
backend provides an export
method that allows to export
the data from the Spectra
object (parameter x
) to a file in mgf format.
See the package vignette for details and examples.
Default mappings from fields in the MGF file to spectra variable names are
provided by the spectraVariableMapping
function. This function returns a
named character vector were names are the spectra variable names and the
values the respective field names in the MGF files. This named character
vector is submitted to the import and export function with parameter
mapping
. It is also possible to pass own mappings (e.g. for special
MGF dialects) with the mapping
parameter.
Usage
# S4 method for class 'MsBackendMgf'
backendInitialize(
object,
files,
mapping = spectraVariableMapping(object),
nlines = -1L,
...,
BPPARAM = SerialParam()
)
MsBackendMgf()
# S4 method for class 'MsBackendMgf'
spectraVariableMapping(object, format = c("mgf"))
# S4 method for class 'MsBackendMgf'
export(
object,
x,
file = tempfile(),
mapping = spectraVariableMapping(object),
exportTitle = TRUE,
...
)
Arguments
- object
Instance of
MsBackendMgf
class.- files
character
with the (full) file name(s) of the mgf file(s) from which MS/MS data should be imported.- mapping
for
backendInitialize
andexport
: namedcharacter
vector allowing to specify how fields from the MGF file should be renamed. Names are supposed to be the spectra variable name and values of the vector the field names in the MGF file. See output ofspectraVariableMapping()
for the expected format and examples below or description above for details.- nlines
for
backendInitialize
:integer(1)
defining the number of lines that should be imported and processed from the MGF file(s). By default (nlines = -1L
) the full file is imported and processed at once. If set to a positive integer, the data is imported and processed chunk-wise usingreadMgfSplit()
.- ...
Currently ignored.
- BPPARAM
Parameter object defining the parallel processing setup. If parallel processing is enabled (with
BPPARAM
different thanSerialParam()
, the default) and length offiles
is larger than one, import is performed in parallel on a per-file basis. If data is to be imported from a single file (i.e., length offiles
is one), parsing of the imported file is performed in parallel. See alsoSerialParam()
for information on available parallel processing setup options.- format
for
spectraVariableMapping
:character(1)
defining the format to be used. Currently onlyformat = "mgf"
is supported.- x
for
export
: an instance ofSpectra()
class with the data that should be exported.- file
character(1)
with the (full) file name to which the data should be exported.- exportTitle
logical(1)
whether the TITLE field should be included in the exported MGF file. IfTRUE
(the default) aspectraVariable
called"TITLE"
will be used, if no such variable is present either thespectraNames(object)
will be used or, if they are empty, a title will be generated including the MS level, retention time and acquisition number of the spectrum.
Examples
library(BiocParallel)
fls <- dir(system.file("extdata", package = "MsBackendMgf"),
full.names = TRUE, pattern = "mgf$")
## Create an MsBackendMgf backend and import data from test mgf files.
be <- backendInitialize(MsBackendMgf(), fls)
#> Start data import from 4 files ...
#> done
be
#> MsBackendMgf with 10 spectra
#> msLevel rtime scanIndex
#> <integer> <numeric> <integer>
#> 1 2 1028.00 NA
#> 2 2 1117.00 NA
#> 3 2 1127.00 NA
#> 4 2 2678.94 NA
#> 5 2 2373.51 NA
#> 6 2 2511.03 NA
#> 7 2 NA NA
#> 8 2 162.07 NA
#> 9 2 1028.00 NA
#> 10 2 1028.00 NA
#> ... 20 more variables/columns.
be$msLevel
#> [1] 2 2 2 2 2 2 2 2 2 2
be$intensity
#> NumericList of length 10
#> [[1]] 753.738 385.376 315.441 413.206 ... 3038.73 2016.43 1146.04 704.175
#> [[2]] 1228.93 1424.66 1550.9 1455.45 ... 7380.41 4960.92 5743.83 1780.76
#> [[3]] 1340.44 1714.76 1938.82 1450.36 2019 ... 5323.02 2265.43 4768.14 1532.12
#> [[4]] 81011.57 4123.349 4006.9321 66933.17 ... 22042.248 18096.48 12666.438
#> [[5]] 14.1766004562378 18.5806427001953 ... 22.7096385955811 14.864013671875
#> [[6]] 1748.57495117188 8689.9951171875 ... 2907.08422851562 2663.30908203125
#> [[7]] 65.219513 178.758606 13.01786 119.898499 ... 22.05921 30.57095 14.11111
#> [[8]] numeric(0)
#> [[9]] 753.738 385.376 315.441 413.206 ... 3038.73 2016.43 1146.04 704.175
#> [[10]] 753.738 385.376 315.441 413.206 ... 3038.73 2016.43 1146.04 704.175
be$mz
#> NumericList of length 10
#> [[1]] 102.0548 103.00494 103.03531 ... 1388.58691 1405.59729 1406.57666
#> [[2]] 101.07074 102.05486 103.00227 ... 1331.56726 1348.58496 1349.59241
#> [[3]] 102.05556 103.00014 115.05058 ... 1333.599 1334.61304 1335.64368
#> [[4]] 101.07122 109.68925 115.86999 120.0811 ... 1260.6073 1261.614 1272.6572
#> [[5]] 130.164459228516 144.150299072266 ... 1019.23852539062 1020.52404785156
#> [[6]] 110.070594787598 120.080627441406 ... 887.756652832031 998.447387695312
#> [[7]] 51.022404 57.033543 57.060638 ... 636.130188 660.481445 753.358521
#> [[8]] numeric(0)
#> [[9]] 102.0548 103.00494 103.03531 ... 1388.58691 1405.59729 1406.57666
#> [[10]] 102.0548 103.00494 103.03531 ... 1388.58691 1405.59729 1406.57666
## The spectra variables that are available; note that not all of them
## have been imported from the MGF files.
spectraVariables(be)
#> [1] "msLevel" "rtime"
#> [3] "acquisitionNum" "scanIndex"
#> [5] "mz" "intensity"
#> [7] "dataStorage" "dataOrigin"
#> [9] "centroided" "smoothed"
#> [11] "polarity" "precScanNum"
#> [13] "precursorMz" "precursorIntensity"
#> [15] "precursorCharge" "collisionEnergy"
#> [17] "isolationWindowLowerMz" "isolationWindowTargetMz"
#> [19] "isolationWindowUpperMz" "TITLE"
#> [21] "RAWFILE" "CLUSTER_ID"
#> [23] "MSLEVEL"
## The variable "TITLE" represents the title of the spectrum defined in the
## MGF file
be$TITLE
#> [1] "File193 Spectrum1719 scans: 2162"
#> [2] "File193 Spectrum1944 scans: 2406"
#> [3] "File193 Spectrum1968 scans: 2432"
#> [4] "mzspec:PXD004732:01650b_BC2-TUM_first_pool_53_01_01-3xHCD-1h-R2:scan:41840:WNQLQAFWGTGK/2"
#> [5] "mzspec:PXD002084:TCGA-AA-A01D-01A-23_W_VU_20121106_A0218_5I_R_FR15:scan:5209:DLTDYLMK/2"
#> [6] "mzspec:MSV000080679:j11962_C1orf144:scan:10671:DLTDYLMK/2"
#> [7] "CCMSLIB00000840351"
#> [8] "blank_2-A,1_01_29559.812.812.1"
#> [9] "File193 Spectrum1719 scans: 2162"
#> [10] "File193 Spectrum1719 scans: 2162"
## The default mapping of MGF fields to spectra variables is provided by
## the spectraVariableMapping function
spectraVariableMapping(MsBackendMgf())
#> rtime acquisitionNum precursorMz precursorIntensity
#> "RTINSECONDS" "SCANS" "PEPMASS" "PEPMASSINT"
#> precursorCharge
#> "CHARGE"
## We can provide our own mapping e.g. to map the MGF field "TITLE" to a
## variable named "spectrumName":
map <- c(spectrumName = "TITLE", spectraVariableMapping(MsBackendMgf()))
map
#> spectrumName rtime acquisitionNum precursorMz
#> "TITLE" "RTINSECONDS" "SCANS" "PEPMASS"
#> precursorIntensity precursorCharge
#> "PEPMASSINT" "CHARGE"
## We can then pass this mapping with parameter `mapping` to the
## backendInitialize method:
be <- backendInitialize(MsBackendMgf(), fls, mapping = map)
#> Start data import from 4 files ...
#> done
## The title is now available as variable named spectrumName
be$spectrumName
#> [1] "File193 Spectrum1719 scans: 2162"
#> [2] "File193 Spectrum1944 scans: 2406"
#> [3] "File193 Spectrum1968 scans: 2432"
#> [4] "mzspec:PXD004732:01650b_BC2-TUM_first_pool_53_01_01-3xHCD-1h-R2:scan:41840:WNQLQAFWGTGK/2"
#> [5] "mzspec:PXD002084:TCGA-AA-A01D-01A-23_W_VU_20121106_A0218_5I_R_FR15:scan:5209:DLTDYLMK/2"
#> [6] "mzspec:MSV000080679:j11962_C1orf144:scan:10671:DLTDYLMK/2"
#> [7] "CCMSLIB00000840351"
#> [8] "blank_2-A,1_01_29559.812.812.1"
#> [9] "File193 Spectrum1719 scans: 2162"
#> [10] "File193 Spectrum1719 scans: 2162"
## Next we create a Spectra object with this data
sps <- Spectra(be)
## We can use the 'MsBackendMgf' also to export spectra data in mgf format.
out_file <- tempfile()
export(sps, backend = MsBackendMgf(), file = out_file, map = map)
## The first 20 lines of the generated file:
readLines(out_file, n = 20)
#> [1] "BEGIN IONS"
#> [2] "msLevel=2"
#> [3] "RTINSECONDS=1028"
#> [4] "SCANS=2162"
#> [5] "centroided=TRUE"
#> [6] "PEPMASS=816.33826"
#> [7] "CHARGE=2+"
#> [8] "TITLE=File193 Spectrum1719 scans: 2162"
#> [9] "102.0548 753.738"
#> [10] "103.00494 385.376"
#> [11] "103.03531 315.441"
#> [12] "115.05001 413.206"
#> [13] "115.08686 588.273"
#> [14] "120.08063 800.016"
#> [15] "124.10555 526.761"
#> [16] "124.20652 326.013"
#> [17] "1369.57385 1485.17"
#> [18] "1370.57495 1280.2"
#> [19] "1387.59021 3038.73"
#> [20] "1388.58691 2016.43"
## Next we add a new spectra variable to each spectrum
sps$spectrum_idx <- seq_along(sps)
## This new spectra variable will also be exported to the mgf file:
export(sps, backend = MsBackendMgf(), file = out_file, map = map)
readLines(out_file, n = 20)
#> [1] "BEGIN IONS"
#> [2] "msLevel=2"
#> [3] "RTINSECONDS=1028"
#> [4] "SCANS=2162"
#> [5] "centroided=TRUE"
#> [6] "PEPMASS=816.33826"
#> [7] "CHARGE=2+"
#> [8] "TITLE=File193 Spectrum1719 scans: 2162"
#> [9] "spectrum_idx=1"
#> [10] "102.0548 753.738"
#> [11] "103.00494 385.376"
#> [12] "103.03531 315.441"
#> [13] "115.05001 413.206"
#> [14] "115.08686 588.273"
#> [15] "120.08063 800.016"
#> [16] "124.10555 526.761"
#> [17] "124.20652 326.013"
#> [18] "1369.57385 1485.17"
#> [19] "1370.57495 1280.2"
#> [20] "1387.59021 3038.73"