Download the 3 first mzML and mzID files from the PXD022816 project (Morgenstern, Barzilay, and Levin 2021Morgenstern, David, Rotem Barzilay, and Yishai Levin. 2021. “RawBeans: A Simple, Vendor-Independent, Raw-Data Quality-Control Tool.” Journal of Proteome Research. https://doi.org/10.1021/acs.jproteome.0c00956.).
Generate a Spectra
object and a table of filtered PSMs. Visualise
the total ion chromatograms and check the quality of the
identification data by comparing the density of the decoy and target
PSMs id scores for each file.
Join the raw and identification data. Beware though that the joining must now be performed by spectrum ids and by files.
Extract the PSMs that have been matched to peptides from protein
O43175
and compare and cluster the scans. Hint: once you have
created the smaller Spectra
object with the scans of interest,
switch to an in-memory backend to seed up the calculations.
Generate total ion chromatograms for each acquisition and annotate
the MS1 scans with the number of PSMs using the
countIdentifications()
function, as shown above. The function will
automatically perform the counts in parallel for each acquisition.
Download the spectra and protein
database
needed for the exercise (here is a direct
link. The
protein database is in fasta format and can be processed as described
in the section 4.8 Reading and processing protein
sequences. The MS2 spectra are provided in the Mascot Generic Format
(MGF) format, that can be loaded using the dedicated
MsBackendMgf
backend as Spectra
objects.
You are asked to write code to identify the spectra, following the principles defined in the Identification data chapter, include ways to provide confidence in your identification results, beyond a single identification score.
Hints:
m/z * c - proton_mass * c
, where m/z
and c
is the mass-over-charge and the charge of
the precursor and proton_mass
is the mass of a proton (available
with PSMatch::getAtomicMass()[["p"]]
).PSMatch::getAminoAcids()
function returns a data.frame
of
amino acid properties.spectrapply
to iterate of the individual scans of a
Spectra
object.Following up from the quantitative data analysis seen on chapter 5, the following file includes a third condition C and a two additional lab, tallying now 27 samples.
<- MsDataHub::cptac_a_b_c_peptides.txt() f
LTQ-Orbitrap_86 | LTQ-OrbitrapO_65 | LTQ-OrbitrapW_56 | |
---|---|---|---|
6A | 3 | 3 | 3 |
6B | 3 | 3 | 3 |
6C | 3 | 3 | 3 |
The full design is shown below.
TRUE | id | condition | lab | previous |
---|---|---|---|---|
6A_1 | 1 | 6A | LTQ-Orbitrap_86 | new |
6A_2 | 2 | 6A | LTQ-Orbitrap_86 | new |
6A_3 | 3 | 6A | LTQ-Orbitrap_86 | new |
6A_4 | 4 | 6A | LTQ-OrbitrapO_65 | new |
6A_5 | 5 | 6A | LTQ-OrbitrapO_65 | new |
6A_6 | 6 | 6A | LTQ-OrbitrapO_65 | new |
6A_7 | 7 | 6A | LTQ-OrbitrapW_56 | |
6A_8 | 8 | 6A | LTQ-OrbitrapW_56 | |
6A_9 | 9 | 6A | LTQ-OrbitrapW_56 | |
6B_1 | 1 | 6B | LTQ-Orbitrap_86 | new |
6B_2 | 2 | 6B | LTQ-Orbitrap_86 | new |
6B_3 | 3 | 6B | LTQ-Orbitrap_86 | new |
6B_4 | 4 | 6B | LTQ-OrbitrapO_65 | new |
6B_5 | 5 | 6B | LTQ-OrbitrapO_65 | new |
6B_6 | 6 | 6B | LTQ-OrbitrapO_65 | new |
6B_7 | 7 | 6B | LTQ-OrbitrapW_56 | |
6B_8 | 8 | 6B | LTQ-OrbitrapW_56 | |
6B_9 | 9 | 6B | LTQ-OrbitrapW_56 | |
6C_1 | 1 | 6C | LTQ-Orbitrap_86 | new |
6C_2 | 2 | 6C | LTQ-Orbitrap_86 | new |
6C_3 | 3 | 6C | LTQ-Orbitrap_86 | new |
6C_4 | 4 | 6C | LTQ-OrbitrapO_65 | new |
6C_5 | 5 | 6C | LTQ-OrbitrapO_65 | new |
6C_6 | 6 | 6C | LTQ-OrbitrapO_65 | new |
6C_7 | 7 | 6C | LTQ-OrbitrapW_56 | new |
6C_8 | 8 | 6C | LTQ-OrbitrapW_56 | new |
6C_9 | 9 | 6C | LTQ-OrbitrapW_56 | new |
Page built: 2025-06-20 using R version 4.5.0 (2025-04-11)