Create precursor identfiers — createPrecursorId • QFeatures

The createPrecursorId() is used to create new precursor identifier columns in a QFeatures object's assays (more precisely in their rowData). The new variable is called by default "Precursor.Id", and is generated by the concatenation of other rowData variables that, together, should create unique identifiers.

These precursor identifiers, assuming their are unique, can then be used to join assays using joinAssays(), rather than using the rownames, as illustrated below.

Usage

createPrecursorId(
  object,
  name = "Precursor.Id",
  fcols = c("Modified.Sequence", "Precursor.Charge"),
  i = seq_along(object)
)

Arguments

object: An instance of class QFeatures.
name: character(1) with the name of the new rowData variable. Default in "Precursor.Id".
fcols: character() with the rowData variables names that need to be paste0()ed together to create the new name variable. Default is c("Modified.Sequence", "Precursor.Charge"). Note that these must be present in all assays.
i: The assays of object whose rowData need to be updated. By default, all assays are considered.

Value

An updated QFeatures instance.

Author

Laurent Gatto

Examples


## Let use PSM assay of feat3, that don't have any precursor identifiers
data(feat4)
feat4
#> An instance of class QFeatures (type: bulk) with 2 sets:
#> 
#>  [1] PSM1: SummarizedExperiment with 7 rows and 2 columns 
#>  [2] PSM2: SummarizedExperiment with 8 rows and 2 columns 
rowDataNames(feat4)
#> CharacterList of length 2
#> [["PSM1"]] Sequence Protein charge
#> [["PSM2"]] Sequence Protein charge

## Create precursor identifiers by concatenating the charge and the
## sequencing
feat4 <- createPrecursorId(feat4,
                           name = "Precursor.Id",
                           fcols = c("charge", "Sequence"))
rowDataNames(feat4)
#> CharacterList of length 2
#> [["PSM1"]] Sequence Protein charge Precursor.Id
#> [["PSM2"]] Sequence Protein charge Precursor.Id
rowData(feat4[[1]])[, c("Sequence", "charge", "Precursor.Id")]
#> DataFrame with 7 rows and 3 columns
#>        Sequence    charge  Precursor.Id
#>     <character> <numeric>   <character>
#> 1      SYGFNAAR         1     1SYGFNAAR
#> 2      SYGFNAAR         1     1SYGFNAAR
#> 3      SYGFNAAR         2     2SYGFNAAR
#> 4      ELGNDAYK         1     1ELGNDAYK
#> 5      ELGNDAYK         2     2ELGNDAYK
#> 6      ELGNDAYK         3     3ELGNDAYK
#> 7 IAEESNFPFI...         1 1IAEESNFPF...

## As can be seen below, some precursors are duplicated, which will be
## problematic when joining the assays. Should we join `1SYGFNAAR` in the
## second assay with the first or the second `1SYGFNAAR` in the first assay?
rowData(feat4[[1]])[, "Precursor.Id", drop = FALSE]
#> DataFrame with 7 rows and 1 column
#>    Precursor.Id
#>     <character>
#> 1     1SYGFNAAR
#> 2     1SYGFNAAR
#> 3     2SYGFNAAR
#> 4     1ELGNDAYK
#> 5     2ELGNDAYK
#> 6     3ELGNDAYK
#> 7 1IAEESNFPF...
rowData(feat4[[2]])[, "Precursor.Id", drop = FALSE]
#> DataFrame with 8 rows and 1 column
#>    Precursor.Id
#>     <character>
#> 1     1SYGFNAAR
#> 2     1ELGNDAYK
#> 3     2ELGNDAYK
#> 4     3ELGNDAYK
#> 5 1IAEESNFPF...
#> 6 1IAEESNFPF...
#> 7 2IAEESNFPF...
#> 8 3IAEESNFPF...

## Here, one can either aggregate PSMs into PSMs with unique identifers (see
## ?aggregateFeatures) or remove duplicated entries.
nrows(feat4) ## before filtering
#> PSM1 PSM2 
#>    7    8 
feat4 <- filterFeatures(feat4, ~ !isDuplicated(Precursor.Id))
#> 'Precursor.Id' found in 2 out of 2 assay(s).
nrows(feat4) ## after filtering
#> PSM1 PSM2 
#>    5    6 

## The assays can now be joined, using the newly created identifier rather
## than the (default) rownames.
feat4 <- joinAssays(feat4, i = 1:2,
                    name = "Precursors",
                    fcol = "Precursor.Id")
#> Using 'Precursor.Id' to join assays.
feat4
#> An instance of class QFeatures (type: bulk) with 3 sets:
#> 
#>  [1] PSM1: SummarizedExperiment with 5 rows and 2 columns 
#>  [2] PSM2: SummarizedExperiment with 6 rows and 2 columns 
#>  [3] Precursors: SummarizedExperiment with 8 rows and 4 columns