R/readQFeaturesFromDIANN.R
readQFeaturesFromDIANN.Rd
This function takes the Report.tsv
output files from DIA-NN and
converts them into a multi-set QFeatures
object. It is a wrapper
around readQFeatures()
with default parameters set to match
DIA-NN label-free and plexDIA report files: default runCol
is
"File.Name"
and default quantColsis
"Ms1.Area"`.
readQFeaturesFromDIANN(
assayData,
colData = NULL,
quantCols = "Ms1.Area",
runCol = "File.Name",
multiplexing = c("none", "mTRAQ"),
extractedData = NULL,
ecol = NULL,
verbose = TRUE,
...
)
A data.frame
, or any object that can be coerced
into a data.frame
, holding the quantitative assay. For
readSummarizedExperiment()
, this can also be a
character(1)
pointing to a filename. This data.frame
is
typically generated by an identification and quantification
software, such as Sage, Proteome Discoverer, MaxQuant, ...
A data.frame
(or any object that can be coerced
to a data.frame
) containing sample/column annotations,
including quantCols
and runCol
(see details).
A numeric()
, logical()
or character()
defining the columns of the assayData
that contain the
quantitative data. This information can also be defined in
colData
(see details).
For the multi-set case, a numeric(1)
or
character(1)
pointing to the column of assayData
(and
colData
, is set) that contains the runs/batches. Make sure
that the column name in both tables are identical and
syntactically valid (if you supply a character
) or have the
same index (if you supply a numeric
). Note that characters
are converted to syntactically valid names using make.names
A character(1)
indicating the type of
multiplexing used in the experiment. One of "none"
(default,
for label-free experiments) or "mTRAQ"
(for plexDIA
experiments).
A data.frame
or any object that can be
coerced to a data.frame
that contains the data from the
*_ms1_extracted.tsv
file generated by DIA-NN. This argument
is optional and is currently only applicable for mTRAQ
multiplexed experiments where DIA-NN was run using the
plexdia
module (see references).
Same as quantCols
. Available for backwards
compatibility. Default is NULL
. If both ecol
and colData
are set, an error is thrown.
A logical(1)
indicating whether the progress of
the data reading and formatting should be printed to the
console. Default is TRUE
.
Further arguments passed to readQFeatures()
.
An instance of class QFeatures
. The quantiative data of
each acquisition run is stored in a separate set as a
SummarizedExperiment
object.
Derks, Jason, Andrew Leduc, Georg Wallmann, R. Gray Huffman, Matthew Willetts, Saad Khan, Harrison Specht, Markus Ralser, Vadim Demichev, and Nikolai Slavov. 2022. "Increasing the Throughput of Sensitive Proteomics by plexDIA." Nature Biotechnology, July. Link to article
The QFeatures
(see QFeatures()
) class to read about how to
manipulate the resulting QFeatures
object.
The readQFeatures()
function which this one depends on.
x <- read.delim(MsDataHub::benchmarkingDIA.tsv())
#> see ?MsDataHub and browseVignettes('MsDataHub') for documentation
#> loading from cache
x[["File.Name"]] <- x[["Run"]]
#################################
## Label-free multi-set case
## using default arguments
readQFeaturesFromDIANN(x)
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
#> An instance of class QFeatures containing 24 set(s):
#> [1] RD139_Overlap_UPS1_0_1fmol_inj1: SummarizedExperiment with 28980 rows and 1 columns
#> [2] RD139_Overlap_UPS1_0_1fmol_inj2: SummarizedExperiment with 29495 rows and 1 columns
#> [3] RD139_Overlap_UPS1_0_1fmol_inj3: SummarizedExperiment with 29210 rows and 1 columns
#> ...
#> [22] RD139_Overlap_UPS1_5fmol_inj1: SummarizedExperiment with 30941 rows and 1 columns
#> [23] RD139_Overlap_UPS1_5fmol_inj2: SummarizedExperiment with 30321 rows and 1 columns
#> [24] RD139_Overlap_UPS1_5fmol_inj3: SummarizedExperiment with 24168 rows and 1 columns
## with a colData (and default arguments)
cd <- data.frame(sampleInfo = LETTERS[1:24],
quantCols = "Ms1.Area",
runCol = unique(x[["File.Name"]]))
readQFeaturesFromDIANN(x, colData = cd)
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
#> An instance of class QFeatures containing 24 set(s):
#> [1] RD139_Overlap_UPS1_0_1fmol_inj1: SummarizedExperiment with 28980 rows and 1 columns
#> [2] RD139_Overlap_UPS1_0_1fmol_inj2: SummarizedExperiment with 29495 rows and 1 columns
#> [3] RD139_Overlap_UPS1_0_1fmol_inj3: SummarizedExperiment with 29210 rows and 1 columns
#> ...
#> [22] RD139_Overlap_UPS1_5fmol_inj1: SummarizedExperiment with 30941 rows and 1 columns
#> [23] RD139_Overlap_UPS1_5fmol_inj2: SummarizedExperiment with 30321 rows and 1 columns
#> [24] RD139_Overlap_UPS1_5fmol_inj3: SummarizedExperiment with 24168 rows and 1 columns
#################################
## mTRAQ multi-set case
x2 <- read.delim(MsDataHub::Report.Derks2022.plexDIA.tsv())
#> see ?MsDataHub and browseVignettes('MsDataHub') for documentation
#> loading from cache
x2[["File.Name"]] <- x2[["Run"]]
readQFeaturesFromDIANN(x2, multiplexing = "mTRAQ")
#> Pivoting quantiative data.
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
#> An instance of class QFeatures containing 54 set(s):
#> [1] wJD1146: SummarizedExperiment with 2635 rows and 3 columns
#> [2] wJD1147: SummarizedExperiment with 3000 rows and 3 columns
#> [3] wJD1148: SummarizedExperiment with 2676 rows and 3 columns
#> ...
#> [52] wJD1203: SummarizedExperiment with 4441 rows and 3 columns
#> [53] wJD1204: SummarizedExperiment with 4416 rows and 3 columns
#> [54] wJD1205: SummarizedExperiment with 4492 rows and 3 columns