Download the 3 first mzML and mzID files from the PXD022816 project (Morgenstern, Barzilay, and Levin 2021Morgenstern, David, Rotem Barzilay, and Yishai Levin. 2021. “RawBeans: A Simple, Vendor-Independent, Raw-Data Quality-Control Tool.” Journal of Proteome Research. https://doi.org/10.1021/acs.jproteome.0c00956.).
Generate a Spectra object and a table of filtered PSMs. Visualise
the total ion chromatograms and check the quality of the
identification data by comparing the density of the decoy and target
PSMs id scores for each file.
Join the raw and identification data. Beware though that the joining must now be performed by spectrum ids and by files.
Extract the PSMs that have been matched to peptides from protein
O43175 and compare and cluster the scans. Hint: once you have
created the smaller Spectra object with the scans of interest,
switch to an in-memory backend to seed up the calculations.
Generate total ion chromatograms for each acquisition and annotate
the MS1 scans with the number of PSMs using the
countIdentifications() function, as shown above. The function will
automatically perform the counts in parallel for each acquisition.
Download the spectra and protein
database
needed for the exercise (here is a direct
link. The
protein database is in fasta format and can be processed as described
in the section 4.8 Reading and processing protein
sequences. The MS2 spectra are provided in the Mascot Generic Format
(MGF) format, that can be loaded using the dedicated
MsBackendMgf
backend as Spectra objects.
You are asked to write code to identify the spectra, following the principles defined in the Identification data chapter, include ways to provide confidence in your identification results, beyond a single identification score.
Hints:
m/z * c - proton_mass * c, where m/z and c is the mass-over-charge and the charge of
the precursor and proton_mass is the mass of a proton (available
with PSMatch::getAtomicMass()[["p"]]).PSMatch::getAminoAcids() function returns a data.frame of
amino acid properties.spectrapply to iterate of the individual scans of a
Spectra object.Following up from the quantitative data analysis seen on chapter 5, the following file includes a third condition C and a two additional lab, tallying now 27 samples.
f <- MsDataHub::cptac_a_b_c_peptides.txt()| LTQ-Orbitrap_86 | LTQ-OrbitrapO_65 | LTQ-OrbitrapW_56 | |
|---|---|---|---|
| 6A | 3 | 3 | 3 |
| 6B | 3 | 3 | 3 |
| 6C | 3 | 3 | 3 |
The full design is shown below.
| TRUE | id | condition | lab | previous |
|---|---|---|---|---|
| 6A_1 | 1 | 6A | LTQ-Orbitrap_86 | new |
| 6A_2 | 2 | 6A | LTQ-Orbitrap_86 | new |
| 6A_3 | 3 | 6A | LTQ-Orbitrap_86 | new |
| 6A_4 | 4 | 6A | LTQ-OrbitrapO_65 | new |
| 6A_5 | 5 | 6A | LTQ-OrbitrapO_65 | new |
| 6A_6 | 6 | 6A | LTQ-OrbitrapO_65 | new |
| 6A_7 | 7 | 6A | LTQ-OrbitrapW_56 | |
| 6A_8 | 8 | 6A | LTQ-OrbitrapW_56 | |
| 6A_9 | 9 | 6A | LTQ-OrbitrapW_56 | |
| 6B_1 | 1 | 6B | LTQ-Orbitrap_86 | new |
| 6B_2 | 2 | 6B | LTQ-Orbitrap_86 | new |
| 6B_3 | 3 | 6B | LTQ-Orbitrap_86 | new |
| 6B_4 | 4 | 6B | LTQ-OrbitrapO_65 | new |
| 6B_5 | 5 | 6B | LTQ-OrbitrapO_65 | new |
| 6B_6 | 6 | 6B | LTQ-OrbitrapO_65 | new |
| 6B_7 | 7 | 6B | LTQ-OrbitrapW_56 | |
| 6B_8 | 8 | 6B | LTQ-OrbitrapW_56 | |
| 6B_9 | 9 | 6B | LTQ-OrbitrapW_56 | |
| 6C_1 | 1 | 6C | LTQ-Orbitrap_86 | new |
| 6C_2 | 2 | 6C | LTQ-Orbitrap_86 | new |
| 6C_3 | 3 | 6C | LTQ-Orbitrap_86 | new |
| 6C_4 | 4 | 6C | LTQ-OrbitrapO_65 | new |
| 6C_5 | 5 | 6C | LTQ-OrbitrapO_65 | new |
| 6C_6 | 6 | 6C | LTQ-OrbitrapO_65 | new |
| 6C_7 | 7 | 6C | LTQ-OrbitrapW_56 | new |
| 6C_8 | 8 | 6C | LTQ-OrbitrapW_56 | new |
| 6C_9 | 9 | 6C | LTQ-OrbitrapW_56 | new |
Page built: 2025-06-20 using R version 4.5.0 (2025-04-11)