Skip to contents

These functions convert tabular data into dedicated data objets. The readSummarizedExperiment() function takes a file name or data.frame and converts it into a SummarizedExperiment() object. The readQFeatures() function takes a data.frame and converts it into a QFeatures object (see QFeatures() for details). For the latter, two use-cases exist:

  • The single-set case will generate a QFeatures object with a single SummarizedExperiment containing all features of the input table.

  • The multi-set case will generate a QFeatures object containing multiple SummarizedExperiments, resulting from splitting the input table. This multi-set case is generally used when the input table contains data from multiple runs/batches.

Usage

readSummarizedExperiment(
  assayData,
  quantCols = NULL,
  fnames = NULL,
  ecol = NULL,
  ...
)

readQFeatures(
  assayData,
  colData = NULL,
  quantCols = NULL,
  runCol = NULL,
  name = "quants",
  removeEmptyCols = FALSE,
  verbose = TRUE,
  ecol = NULL,
  fnames = NULL,
  ...
)

Arguments

assayData

A data.frame, or any object that can be coerced into a data.frame, holding the quantitative assay. For readSummarizedExperiment(), this can also be a character(1) pointing to a filename. This data.frame is typically generated by an identification and quantification software, such as Sage, Proteome Discoverer, MaxQuant, ...

quantCols

A numeric(), logical() or character() defining the columns of the assayData that contain the quantitative data. This information can also be defined in colData (see details).

fnames

For the single- and multi-set cases, an optional character(1) or numeric(1) indicating the column to be used as feature names. Note that rownames must be unique within QFeatures sets. Default is NULL. See also section 'Feature names'.

ecol

Same as quantCols. Available for backwards compatibility. Default is NULL. If both ecol and colData are set, an error is thrown.

...

Further arguments that can be passed on to read.csv() except stringsAsFactors, which is always FALSE. Only applicable to readSummarizedExperiment().

colData

A data.frame (or any object that can be coerced to a data.frame) containing sample/column annotations, including quantCols and runCol (see details).

runCol

For the multi-set case, a numeric(1) or character(1) pointing to the column of assayData (and colData, is set) that contains the runs/batches. Make sure that the column name in both tables are identical and syntactically valid (if you supply a character) or have the same index (if you supply a numeric). Note that characters are converted to syntactically valid names using make.names

name

For the single-set case, an optional character(1) to name the set in the QFeatures object. Default is quants.

removeEmptyCols

A logical(1). If TRUE, quantitative columns that contain only missing values are removed.

verbose

A logical(1) indicating whether the progress of the data reading and formatting should be printed to the console. Default is TRUE.

Value

An instance of class QFeatures or SummarizedExperiment::SummarizedExperiment(). For the former, the quantitative sets of each run are stored in SummarizedExperiment::SummarizedExperiment() object.

Details

The single- and multi-set cases are defined by the quantCols and runCol parameters, whether passed by the quantCols and runCol vectors and/or the colData data.frame (see below).

Single-set case

The quantitative data variables are defined by the quantCols. The single-set case can be represented schematically as shown below.

|------+----------------+-----------|
| cols | quantCols 1..N | more cols |
| .    | ...            | ...       |
| .    | ...            | ...       |
| .    | ...            | ...       |
|------+----------------+-----------|

Note that every quantCols column contains data for a single sample. The single-set case is defined by the absence of any runCol input (see next section). We here provide a (non-exhaustive) list of typical data sets that fall under the single-set case:

  • Peptide- or protein-level label-free data (bulk or single-cell).

  • Peptide- or protein-level multiplexed (e.g. TMT) data (bulk or single-cell).

  • PSM-level multiplexed data acquired in a single MS run (bulk or single-cell).

  • PSM-level data from fractionation experiments, where each fraction of the same sample was acquired with the same multiplexing label.

Multi-set case

A run/batch variable, runCol, is required to import multi-set data. The multi-set case can be represented schematically as shown below.

|--------+------+----------------+-----------|
| runCol | cols | quantCols 1..N | more cols |
|   1    | .    | ...            | ...       |
|   1    | .    | ...            | ...       |
|--------+------+----------------+-----------|
|   2    | .    | ...            | ...       |
|--------+------+----------------+-----------|
|   .    | .    | ...            | ...       |
|--------+------+----------------+-----------|

Every quantCols column contains data for multiple samples acquired in different runs. The multi-set case applies when runCol is provided, which will determine how the table is split into multiple sets.

We here provide a (non-exhaustive) list of typical data sets that fall under the multi-set case:

  • PSM- or precursor-level multiplexed data acquired in multiple runs (bulk or single-cell)

  • PSM- or precursor-level label-free data acquired in multiple runs (bulk or single-cell)

  • DIA-NN data (see also readQFeaturesFromDIANN()).

Adding sample annotations with colData

We recommend providing sample annotations when creating a QFeatures object. The colData is a table in which each row corresponds to a sample and each column provides information about the samples. There is no restriction on the number of columns and on the type of data they should contain. However, we impose one or two columns (depending on the use case) that allow to link the annotations of each sample to its quantitative data:

  • Single-set case: the colData must contain a column named quantCols that provides the names of the columns in assayData containing quantitative values for each sample (see single-set cases in the examples).

  • Multi-set case: the colData must contain a column named quantCols that provides the names of the columns in assayData with the quantitative values for each sample, and a column named runCol that provides the MS runs/batches in which each sample has been acquired. The entries in colData[["runCol"]] are matched against the entries provided by assayData[[runCol]].

When the quantCols argument is not provided to readQFeatures(), the function will automatically determine the quantCols from colData[["quantCols"]]. Therefore, quantCols and colData cannot be both missing.

Samples that are present in assayData but absent colData will lead to a warning, and the missing entries will be automatically added to the colData and filled with NAs.

When using the quantCols and runCol arguments only (without colData), the colData contains zero columns/variables.

Feature names

Assay feature (i.e. rownames) are important as they are used when assays are joined with joinAssays(). They can be set upon creation of the QFeatures() object by setting the fnames argument. See also createPrecursorId() in case a precursor identifier is note readily available and should be created from other, existing rowData variables.

See also

  • The QFeatures (see QFeatures()) class to read about how to manipulate the resulting QFeatures object.

  • The readQFeaturesFromDIANN() function to import DIA-NN quantitative data.

Author

Laurent Gatto, Christophe Vanderaa

Examples


######################################
## Single-set case.

## Load a data.frame with PSM-level data
data(hlpsms)
hlpsms[1:10, c(1, 2, 10:11, 14, 17)]
#>           X126      X127C       X131 Sequence ProteinGroupAccessions     PEP
#> 383 0.12283431 0.08045915 0.11961594  SQGEIDk                 Q8BYY4 0.11800
#> 475 0.35268185 0.14162381 0.02957384  YEAQGDk                 P46467 0.01070
#> 478 0.01546089 0.16142297 0.04370403  TTScDTk                 Q64449 0.11800
#> 552 0.04702854 0.09288723 0.10014038  aEELESR                 P60469 0.04450
#> 596 0.01044693 0.15866147 0.02307803  aQEEAIk               P13597-2 0.00850
#> 610 0.04955362 0.01215244 0.29732174 dGAVDGcR                 Q6P5D8 0.00322
#> 731 0.04007112 0.06632932 0.10188731 AcDSAEVk                 Q01237 0.04090
#> 786 0.16122744 0.10251588 0.04884985 VSSDEDLk                 Q9D8U8 0.00130
#> 795 0.60288497 0.11022069 0.02182222  TDQNYEk                 Q8BMJ2 0.01880
#> 816 0.10298287 0.05818306 0.07723716  QEEIQQk                 Q3URD3 0.02900

## Create a QFeatures object with a single psms set
qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms")
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf1
#> An instance of class QFeatures (type: bulk) with 1 set:
#> 
#>  [1] psms: SummarizedExperiment with 3010 rows and 10 columns 
colData(qf1)
#> DataFrame with 10 rows and 0 columns

######################################
## Single-set case with colData.

(coldat <- data.frame(var = rnorm(10),
                      quantCols = names(hlpsms)[1:10]))
#>           var quantCols
#> 1   0.8497468      X126
#> 2   0.6412677     X127C
#> 3  -0.8533205     X127N
#> 4   1.4502467     X128C
#> 5  -0.5834776     X128N
#> 6  -1.5154718     X129C
#> 7   1.4507215     X129N
#> 8  -0.8785367     X130C
#> 9   0.4751676     X130N
#> 10 -1.7268766      X131
qf2 <- readQFeatures(hlpsms, colData = coldat)
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf2
#> An instance of class QFeatures (type: bulk) with 1 set:
#> 
#>  [1] quants: SummarizedExperiment with 3010 rows and 10 columns 
colData(qf2)
#> DataFrame with 10 rows and 2 columns
#>             var   quantCols
#>       <numeric> <character>
#> X126   0.849747        X126
#> X127C  0.641268       X127C
#> X127N -0.853321       X127N
#> X128C  1.450247       X128C
#> X128N -0.583478       X128N
#> X129C -1.515472       X129C
#> X129N  1.450721       X129N
#> X130C -0.878537       X130C
#> X130N  0.475168       X130N
#> X131  -1.726877        X131

######################################
## Multi-set case.

## Let's simulate 3 different files/batches for that same input
## data.frame, and define a colData data.frame.

hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE))
hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)]
#>           X126      X127C       X131 Sequence ProteinGroupAccessions     PEP
#> 383 0.12283431 0.08045915 0.11961594  SQGEIDk                 Q8BYY4 0.11800
#> 475 0.35268185 0.14162381 0.02957384  YEAQGDk                 P46467 0.01070
#> 478 0.01546089 0.16142297 0.04370403  TTScDTk                 Q64449 0.11800
#> 552 0.04702854 0.09288723 0.10014038  aEELESR                 P60469 0.04450
#> 596 0.01044693 0.15866147 0.02307803  aQEEAIk               P13597-2 0.00850
#> 610 0.04955362 0.01215244 0.29732174 dGAVDGcR                 Q6P5D8 0.00322
#> 731 0.04007112 0.06632932 0.10188731 AcDSAEVk                 Q01237 0.04090
#> 786 0.16122744 0.10251588 0.04884985 VSSDEDLk                 Q9D8U8 0.00130
#> 795 0.60288497 0.11022069 0.02182222  TDQNYEk                 Q8BMJ2 0.01880
#> 816 0.10298287 0.05818306 0.07723716  QEEIQQk                 Q3URD3 0.02900
#>      file
#> 383 File2
#> 475 File2
#> 478 File1
#> 552 File2
#> 596 File1
#> 610 File2
#> 731 File2
#> 786 File2
#> 795 File3
#> 816 File1

qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file")
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf3
#> An instance of class QFeatures (type: bulk) with 3 sets:
#> 
#>  [1] File1: SummarizedExperiment with 987 rows and 10 columns 
#>  [2] File2: SummarizedExperiment with 1000 rows and 10 columns 
#>  [3] File3: SummarizedExperiment with 1023 rows and 10 columns 
colData(qf3)
#> DataFrame with 30 rows and 0 columns


######################################
## Multi-set case with colData.

(coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10),
                      var = rnorm(10),
                      quantCols = names(hlpsms)[1:10]))
#>    runCol        var quantCols
#> 1   File1 -0.5998708      X126
#> 2   File1  1.4683437     X127C
#> 3   File1 -0.6198368     X127N
#> 4   File1  0.5668461     X128C
#> 5   File1  1.3854334     X128N
#> 6   File1  1.3984792     X129C
#> 7   File1 -1.7610803     X129N
#> 8   File1 -0.4927893     X130C
#> 9   File1  0.8543945     X130N
#> 10  File1  0.3859256      X131
#> 11  File2 -0.5998708      X126
#> 12  File2  1.4683437     X127C
#> 13  File2 -0.6198368     X127N
#> 14  File2  0.5668461     X128C
#> 15  File2  1.3854334     X128N
#> 16  File2  1.3984792     X129C
#> 17  File2 -1.7610803     X129N
#> 18  File2 -0.4927893     X130C
#> 19  File2  0.8543945     X130N
#> 20  File2  0.3859256      X131
#> 21  File3 -0.5998708      X126
#> 22  File3  1.4683437     X127C
#> 23  File3 -0.6198368     X127N
#> 24  File3  0.5668461     X128C
#> 25  File3  1.3854334     X128N
#> 26  File3  1.3984792     X129C
#> 27  File3 -1.7610803     X129N
#> 28  File3 -0.4927893     X130C
#> 29  File3  0.8543945     X130N
#> 30  File3  0.3859256      X131
qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file")
#> Checking arguments.
#> Loading data as a 'SummarizedExperiment' object.
#> Splitting data in runs.
#> Formatting sample annotations (colData).
#> Formatting data as a 'QFeatures' object.
qf4
#> An instance of class QFeatures (type: bulk) with 3 sets:
#> 
#>  [1] File1: SummarizedExperiment with 987 rows and 10 columns 
#>  [2] File2: SummarizedExperiment with 1000 rows and 10 columns 
#>  [3] File3: SummarizedExperiment with 1023 rows and 10 columns 
colData(qf4)
#> DataFrame with 30 rows and 3 columns
#>                  runCol       var   quantCols
#>             <character> <numeric> <character>
#> File1_X126        File1 -0.599871        X126
#> File1_X127C       File1  1.468344       X127C
#> File1_X127N       File1 -0.619837       X127N
#> File1_X128C       File1  0.566846       X128C
#> File1_X128N       File1  1.385433       X128N
#> ...                 ...       ...         ...
#> File3_X129C       File3  1.398479       X129C
#> File3_X129N       File3 -1.761080       X129N
#> File3_X130C       File3 -0.492789       X130C
#> File3_X130N       File3  0.854394       X130N
#> File3_X131        File3  0.385926        X131