These functions convert tabular data into dedicated data
objets. The readSummarizedExperiment()
function takes a file
name or data.frame
and converts it into a
SummarizedExperiment()
object. The readQFeatures()
function
takes a data.frame
and converts it into a QFeatures
object
(see QFeatures()
for details). For the latter, two use-cases
exist:
The single-set case will generate a QFeatures
object with a
single SummarizedExperiment
containing all features of the
input table.
The multi-set case will generate a QFeatures
object containing
multiple SummarizedExperiment
s, resulting from splitting the
input table. This multi-set case is generally used when the
input table contains data from multiple runs/batches.
readSummarizedExperiment(
assayData,
quantCols = NULL,
fnames = NULL,
ecol = NULL,
...
)
readQFeatures(
assayData,
colData = NULL,
quantCols = NULL,
runCol = NULL,
name = "quants",
removeEmptyCols = FALSE,
verbose = TRUE,
ecol = NULL,
...
)
A data.frame
, or any object that can be coerced
into a data.frame
, holding the quantitative assay. For
readSummarizedExperiment()
, this can also be a
character(1)
pointing to a filename. This data.frame
is
typically generated by an identification and quantification
software, such as Sage, Proteome Discoverer, MaxQuant, ...
A numeric()
, logical()
or character()
defining the columns of the assayData
that contain the
quantitative data. This information can also be defined in
colData
(see details).
For the single- and multi-set cases, an optional
character(1)
or numeric(1)
indicating the column to be
used as feature names. Note that rownames must be unique
within QFeatures
sets.
Same as quantCols
. Available for backwards
compatibility. Default is NULL
. If both ecol
and colData
are set, an error is thrown.
Further arguments that can be passed on to read.csv()
except stringsAsFactors
, which is always FALSE
. Only
applicable to readSummarizedExperiment()
.
A data.frame
(or any object that can be coerced
to a data.frame
) containing sample/column annotations,
including quantCols
and runCol
(see details).
For the multi-set case, a numeric(1)
or
character(1)
pointing to the column of assayData
(and
colData
, is set) that contains the runs/batches. Make sure
that the column name in both tables are identical and
syntactically valid (if you supply a character
) or have the
same index (if you supply a numeric
). Note that characters
are converted to syntactically valid names using make.names
For the single-set case, an optional character(1)
to
name the set in the QFeatures
object. Default is quants
.
A logical(1)
. If TRUE
, quantitative
columns that contain only missing values are removed.
A logical(1)
indicating whether the progress of
the data reading and formatting should be printed to the
console. Default is TRUE
.
An instance of class QFeatures
or
SummarizedExperiment::SummarizedExperiment()
. For the
former, the quantitative sets of each run are stored in
SummarizedExperiment::SummarizedExperiment()
object.
The single- and multi-set cases are defined by the quantCols
and
runCol
parameters, whether passed by the quantCols
and
runCol
vectors and/or the colData
data.frame
(see below).
The quantitative data variables are defined by the quantCols
.
The single-set case can be represented schematically as shown
below.
|------+----------------+-----------|
| cols | quantCols 1..N | more cols |
| . | ... | ... |
| . | ... | ... |
| . | ... | ... |
|------+----------------+-----------|
Note that every quantCols
column contains data for a single
sample. The single-set case is defined by the absence of any
runCol
input (see next section). We here provide a
(non-exhaustive) list of typical data sets that fall under the
single-set case:
Peptide- or protein-level label-free data (bulk or single-cell).
Peptide- or protein-level multiplexed (e.g. TMT) data (bulk or single-cell).
PSM-level multiplexed data acquired in a single MS run (bulk or single-cell).
PSM-level data from fractionation experiments, where each fraction of the same sample was acquired with the same multiplexing label.
A run/batch variable, runCol
, is required to import multi-set
data. The multi-set case can be represented schematically as shown
below.
|--------+------+----------------+-----------|
| runCol | cols | quantCols 1..N | more cols |
| 1 | . | ... | ... |
| 1 | . | ... | ... |
|--------+------+----------------+-----------|
| 2 | . | ... | ... |
|--------+------+----------------+-----------|
| . | . | ... | ... |
|--------+------+----------------+-----------|
Every quantCols
column contains data for multiple samples
acquired in different runs. The multi-set case applies when
runCol
is provided, which will determine how the table is split
into multiple sets.
We here provide a (non-exhaustive) list of typical data sets that fall under the multi-set case:
PSM- or precursor-level multiplexed data acquired in multiple runs (bulk or single-cell)
PSM- or precursor-level label-free data acquired in multiple runs (bulk or single-cell)
DIA-NN data (see also readQFeaturesFromDIANN()
).
colData
We recommend providing sample annotations when creating a
QFeatures
object. The colData
is a table in which each row
corresponds to a sample and each column provides information about
the samples. There is no restriction on the number of columns and
on the type of data they should contain. However, we impose one or
two columns (depending on the use case) that allow to link the
annotations of each sample to its quantitative data:
Single-set case: the colData
must contain a column named
quantCols
that provides the names of the columns in
assayData
containing quantitative values for each sample (see
single-set cases in the examples).
Multi-set case: the colData
must contain a column named
quantCols
that provides the names of the columns in
assayData
with the quantitative values for each sample, and a
column named runCol
that provides the MS runs/batches in which
each sample has been acquired. The entries in
colData[["runCol"]]
are matched against the entries provided
by assayData[[runCol]]
.
When the quantCols
argument is not provided to
readQFeatures()
, the function will automatically determine the
quantCols
from colData[["quantCols"]]
. Therefore, quantCols
and colData
cannot be both missing.
Samples that are present in assayData
but absent
colData
will lead to a warning, and the missing entries will be
automatically added to the colData
and filled with NA
s.
When using the quantCols
and runCol
arguments only
(without colData
), the colData
contains zero
columns/variables.
The QFeatures
(see QFeatures()
) class to read about how to
manipulate the resulting QFeatures
object.
The readQFeaturesFromDIANN()
function to import DIA-NN
quantitative data.
######################################
## Single-set case.
## Load a data.frame with PSM-level data
data(hlpsms)
hlpsms[1:10, c(1, 2, 10:11, 14, 17)]
#> X126 X127C X131 Sequence ProteinGroupAccessions PEP
#> 383 0.12283431 0.08045915 0.11961594 SQGEIDk Q8BYY4 0.11800
#> 475 0.35268185 0.14162381 0.02957384 YEAQGDk P46467 0.01070
#> 478 0.01546089 0.16142297 0.04370403 TTScDTk Q64449 0.11800
#> 552 0.04702854 0.09288723 0.10014038 aEELESR P60469 0.04450
#> 596 0.01044693 0.15866147 0.02307803 aQEEAIk P13597-2 0.00850
#> 610 0.04955362 0.01215244 0.29732174 dGAVDGcR Q6P5D8 0.00322
#> 731 0.04007112 0.06632932 0.10188731 AcDSAEVk Q01237 0.04090
#> 786 0.16122744 0.10251588 0.04884985 VSSDEDLk Q9D8U8 0.00130
#> 795 0.60288497 0.11022069 0.02182222 TDQNYEk Q8BMJ2 0.01880
#> 816 0.10298287 0.05818306 0.07723716 QEEIQQk Q3URD3 0.02900
## Create a QFeatures object with a single psms set
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf1
#> An instance of class QFeatures containing 1 set(s):
#> [1] psms: SummarizedExperiment with 3010 rows and 10 columns
colData(qf1)
#> DataFrame with 10 rows and 0 columns
######################################
## Single-set case with colData.
(coldat <- data.frame(var = rnorm(10),
quantCols = names(hlpsms)[1:10]))
#> var quantCols
#> 1 0.7543883 X126
#> 2 -1.4626257 X127C
#> 3 -0.1084145 X127N
#> 4 0.7761296 X128C
#> 5 0.6055939 X128N
#> 6 0.4827772 X129C
#> 7 1.6805106 X129N
#> 8 -0.1508190 X130C
#> 9 0.6795999 X130N
#> 10 0.4499002 X131
qf2 <- readQFeatures(hlpsms, colData = coldat)
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf2
#> An instance of class QFeatures containing 1 set(s):
#> [1] quants: SummarizedExperiment with 3010 rows and 10 columns
colData(qf2)
#> DataFrame with 10 rows and 2 columns
#> var quantCols
#> <numeric> <character>
#> X126 0.754388 X126
#> X127C -1.462626 X127C
#> X127N -0.108415 X127N
#> X128C 0.776130 X128C
#> X128N 0.605594 X128N
#> X129C 0.482777 X129C
#> X129N 1.680511 X129N
#> X130C -0.150819 X130C
#> X130N 0.679600 X130N
#> X131 0.449900 X131
######################################
## Multi-set case.
## Let's simulate 3 different files/batches for that same input
## data.frame, and define a colData data.frame.
hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE))
hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)]
#> X126 X127C X131 Sequence ProteinGroupAccessions PEP
#> 383 0.12283431 0.08045915 0.11961594 SQGEIDk Q8BYY4 0.11800
#> 475 0.35268185 0.14162381 0.02957384 YEAQGDk P46467 0.01070
#> 478 0.01546089 0.16142297 0.04370403 TTScDTk Q64449 0.11800
#> 552 0.04702854 0.09288723 0.10014038 aEELESR P60469 0.04450
#> 596 0.01044693 0.15866147 0.02307803 aQEEAIk P13597-2 0.00850
#> 610 0.04955362 0.01215244 0.29732174 dGAVDGcR Q6P5D8 0.00322
#> 731 0.04007112 0.06632932 0.10188731 AcDSAEVk Q01237 0.04090
#> 786 0.16122744 0.10251588 0.04884985 VSSDEDLk Q9D8U8 0.00130
#> 795 0.60288497 0.11022069 0.02182222 TDQNYEk Q8BMJ2 0.01880
#> 816 0.10298287 0.05818306 0.07723716 QEEIQQk Q3URD3 0.02900
#> file
#> 383 File3
#> 475 File3
#> 478 File2
#> 552 File3
#> 596 File1
#> 610 File2
#> 731 File3
#> 786 File3
#> 795 File2
#> 816 File2
qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file")
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf3
#> An instance of class QFeatures containing 3 set(s):
#> [1] File1: SummarizedExperiment with 986 rows and 10 columns
#> [2] File2: SummarizedExperiment with 998 rows and 10 columns
#> [3] File3: SummarizedExperiment with 1026 rows and 10 columns
colData(qf3)
#> DataFrame with 30 rows and 0 columns
######################################
## Multi-set case with colData.
(coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10),
var = rnorm(10),
quantCols = names(hlpsms)[1:10]))
#> runCol var quantCols
#> 1 File1 0.51626857 X126
#> 2 File1 1.02704216 X127C
#> 3 File1 0.73166309 X127N
#> 4 File1 -0.11245031 X128C
#> 5 File1 1.42917302 X128N
#> 6 File1 0.89530083 X129C
#> 7 File1 -1.50043104 X129N
#> 8 File1 -0.70734051 X130C
#> 9 File1 -0.11555948 X130N
#> 10 File1 0.08755882 X131
#> 11 File2 0.51626857 X126
#> 12 File2 1.02704216 X127C
#> 13 File2 0.73166309 X127N
#> 14 File2 -0.11245031 X128C
#> 15 File2 1.42917302 X128N
#> 16 File2 0.89530083 X129C
#> 17 File2 -1.50043104 X129N
#> 18 File2 -0.70734051 X130C
#> 19 File2 -0.11555948 X130N
#> 20 File2 0.08755882 X131
#> 21 File3 0.51626857 X126
#> 22 File3 1.02704216 X127C
#> 23 File3 0.73166309 X127N
#> 24 File3 -0.11245031 X128C
#> 25 File3 1.42917302 X128N
#> 26 File3 0.89530083 X129C
#> 27 File3 -1.50043104 X129N
#> 28 File3 -0.70734051 X130C
#> 29 File3 -0.11555948 X130N
#> 30 File3 0.08755882 X131
qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file")
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf4
#> An instance of class QFeatures containing 3 set(s):
#> [1] File1: SummarizedExperiment with 986 rows and 10 columns
#> [2] File2: SummarizedExperiment with 998 rows and 10 columns
#> [3] File3: SummarizedExperiment with 1026 rows and 10 columns
colData(qf4)
#> DataFrame with 30 rows and 3 columns
#> runCol var quantCols
#> <character> <numeric> <character>
#> File1_X126 File1 0.516269 X126
#> File1_X127C File1 1.027042 X127C
#> File1_X127N File1 0.731663 X127N
#> File1_X128C File1 -0.112450 X128C
#> File1_X128N File1 1.429173 X128N
#> ... ... ... ...
#> File3_X129C File3 0.8953008 X129C
#> File3_X129N File3 -1.5004310 X129N
#> File3_X130C File3 -0.7073405 X130C
#> File3_X130N File3 -0.1155595 X130N
#> File3_X131 File3 0.0875588 X131