Chapter 6 Supplementary exercises

6.1 Raw data and identification results

  • Download the 3 first mzML and mzID files from the PXD022816 project (Morgenstern, Barzilay, and Levin 2021Morgenstern, David, Rotem Barzilay, and Yishai Levin. 2021. RawBeans: A Simple, Vendor-Independent, Raw-Data Quality-Control Tool.” Journal of Proteome Research. https://doi.org/10.1021/acs.jproteome.0c00956.).

  • Generate a Spectra object and a table of filtered PSMs. Visualise the total ion chromatograms and check the quality of the identification data by comparing the density of the decoy and target PSMs id scores for each file.

  • Join the raw and identification data. Beware though that the joining must now be performed by spectrum ids and by files.

  • Extract the PSMs that have been matched to peptides from protein O43175 and compare and cluster the scans. Hint: once you have created the smaller Spectra object with the scans of interest, switch to an in-memory backend to seed up the calculations.

  • Generate total ion chromatograms for each acquisition and annotate the MS1 scans with the number of PSMs using the countIdentifications() function, as shown above. The function will automatically perform the counts in parallel for each acquisition.

6.2 Search engine

Download the spectra and protein database needed for the exercise (here is a direct link. The protein database is in fasta format and can be processed as described in the section 4.8 Reading and processing protein sequences. The MS2 spectra are provided in the Mascot Generic Format (MGF) format, that can be loaded using the dedicated MsBackendMgf backend as Spectra objects.

You are asked to write code to identify the spectra, following the principles defined in the Identification data chapter, include ways to provide confidence in your identification results, beyond a single identification score.

Hints:

  • Focus on expected peptides sequences that longer than 6 and shorter than 28 amino acids to reduce the search space.
  • Do not search each MS2 scan against the whole database, but focus on peptides that have a mass that is close to the scan’s precuror mass.
  • To calculate the mass of a peptides, use m/z * c - proton_mass * c, where m/z and c is the mass-over-charge and the charge of the precursor and proton_mass is the mass of a proton (available with PSMatch::getAtomicMass()[["p"]]).
  • The PSMatch::getAminoAcids() function returns a data.frame of amino acid properties.
  • Consider using spectrapply to iterate of the individual scans of a Spectra object.

6.3 Quantitative data processing

Following up from the quantitative data analysis seen on chapter 5, the following file includes a third condition C and a two additional lab, tallying now 27 samples.

f <- MsDataHub::cptac_a_b_c_peptides.txt()
LTQ-Orbitrap_86 LTQ-OrbitrapO_65 LTQ-OrbitrapW_56
6A 3 3 3
6B 3 3 3
6C 3 3 3

The full design is shown below.

TRUE id condition lab previous
6A_1 1 6A LTQ-Orbitrap_86 new
6A_2 2 6A LTQ-Orbitrap_86 new
6A_3 3 6A LTQ-Orbitrap_86 new
6A_4 4 6A LTQ-OrbitrapO_65 new
6A_5 5 6A LTQ-OrbitrapO_65 new
6A_6 6 6A LTQ-OrbitrapO_65 new
6A_7 7 6A LTQ-OrbitrapW_56
6A_8 8 6A LTQ-OrbitrapW_56
6A_9 9 6A LTQ-OrbitrapW_56
6B_1 1 6B LTQ-Orbitrap_86 new
6B_2 2 6B LTQ-Orbitrap_86 new
6B_3 3 6B LTQ-Orbitrap_86 new
6B_4 4 6B LTQ-OrbitrapO_65 new
6B_5 5 6B LTQ-OrbitrapO_65 new
6B_6 6 6B LTQ-OrbitrapO_65 new
6B_7 7 6B LTQ-OrbitrapW_56
6B_8 8 6B LTQ-OrbitrapW_56
6B_9 9 6B LTQ-OrbitrapW_56
6C_1 1 6C LTQ-Orbitrap_86 new
6C_2 2 6C LTQ-Orbitrap_86 new
6C_3 3 6C LTQ-Orbitrap_86 new
6C_4 4 6C LTQ-OrbitrapO_65 new
6C_5 5 6C LTQ-OrbitrapO_65 new
6C_6 6 6C LTQ-OrbitrapO_65 new
6C_7 7 6C LTQ-OrbitrapW_56 new
6C_8 8 6C LTQ-OrbitrapW_56 new
6C_9 9 6C LTQ-OrbitrapW_56 new
  • Repeat the analysis described in chapter 5 using the extended dataset, trying to optimise true positive results and avoiding false positive. Think about the best experimental design approach, how to best process the data, visualising important steps along the way, to conclude with a volcano plot and a table tallying the number of true/false positive/negative results.

Page built: 2025-06-20 using R version 4.5.0 (2025-04-11)