The createPrecursorId()
is used to create new precursor identifier columns
in a QFeatures
object's assays (more precisely in their rowData). The new
variable is called by default "Precursor.Id"
, and is generated by the
concatenation of other rowData variables that, together, should create
unique identifiers.
These precursor identifiers, assuming their are unique, can then be used to
join assays using joinAssays()
, rather than using the rownames, as
illustrated below.
Arguments
- object
An instance of class
QFeatures
.- name
character(1)
with the name of the new rowData variable. Default in"Precursor.Id"
.- fcols
character()
with the rowData variables names that need to bepaste0()
ed together to create the newname
variable. Default isc("Modified.Sequence", "Precursor.Charge")
. Note that these must be present in all assays.- i
The assays of
object
whose rowData need to be updated. By default, all assays are considered.
Examples
## Let use PSM assay of feat3, that don't have any precursor identifiers
data(feat4)
feat4
#> An instance of class QFeatures (type: bulk) with 2 sets:
#>
#> [1] PSM1: SummarizedExperiment with 7 rows and 2 columns
#> [2] PSM2: SummarizedExperiment with 8 rows and 2 columns
rowDataNames(feat4)
#> CharacterList of length 2
#> [["PSM1"]] Sequence Protein charge
#> [["PSM2"]] Sequence Protein charge
## Create precursor identifiers by concatenating the charge and the
## sequencing
feat4 <- createPrecursorId(feat4,
name = "Precursor.Id",
fcols = c("charge", "Sequence"))
rowDataNames(feat4)
#> CharacterList of length 2
#> [["PSM1"]] Sequence Protein charge Precursor.Id
#> [["PSM2"]] Sequence Protein charge Precursor.Id
rowData(feat4[[1]])[, c("Sequence", "charge", "Precursor.Id")]
#> DataFrame with 7 rows and 3 columns
#> Sequence charge Precursor.Id
#> <character> <numeric> <character>
#> 1 SYGFNAAR 1 1SYGFNAAR
#> 2 SYGFNAAR 1 1SYGFNAAR
#> 3 SYGFNAAR 2 2SYGFNAAR
#> 4 ELGNDAYK 1 1ELGNDAYK
#> 5 ELGNDAYK 2 2ELGNDAYK
#> 6 ELGNDAYK 3 3ELGNDAYK
#> 7 IAEESNFPFI... 1 1IAEESNFPF...
## As can be seen below, some precursors are duplicated, which will be
## problematic when joining the assays. Should we join `1SYGFNAAR` in the
## second assay with the first or the second `1SYGFNAAR` in the first assay?
rowData(feat4[[1]])[, "Precursor.Id", drop = FALSE]
#> DataFrame with 7 rows and 1 column
#> Precursor.Id
#> <character>
#> 1 1SYGFNAAR
#> 2 1SYGFNAAR
#> 3 2SYGFNAAR
#> 4 1ELGNDAYK
#> 5 2ELGNDAYK
#> 6 3ELGNDAYK
#> 7 1IAEESNFPF...
rowData(feat4[[2]])[, "Precursor.Id", drop = FALSE]
#> DataFrame with 8 rows and 1 column
#> Precursor.Id
#> <character>
#> 1 1SYGFNAAR
#> 2 1ELGNDAYK
#> 3 2ELGNDAYK
#> 4 3ELGNDAYK
#> 5 1IAEESNFPF...
#> 6 1IAEESNFPF...
#> 7 2IAEESNFPF...
#> 8 3IAEESNFPF...
## Here, one can either aggregate PSMs into PSMs with unique identifers (see
## ?aggregateFeatures) or remove duplicated entries.
nrows(feat4) ## before filtering
#> PSM1 PSM2
#> 7 8
feat4 <- filterFeatures(feat4, ~ !isDuplicated(Precursor.Id))
#> 'Precursor.Id' found in 2 out of 2 assay(s).
nrows(feat4) ## after filtering
#> PSM1 PSM2
#> 5 6
## The assays can now be joined, using the newly created identifier rather
## than the (default) rownames.
feat4 <- joinAssays(feat4, i = 1:2,
name = "Precursors",
fcol = "Precursor.Id")
#> Using 'Precursor.Id' to join assays.
feat4
#> An instance of class QFeatures (type: bulk) with 3 sets:
#>
#> [1] PSM1: SummarizedExperiment with 5 rows and 2 columns
#> [2] PSM2: SummarizedExperiment with 6 rows and 2 columns
#> [3] Precursors: SummarizedExperiment with 8 rows and 4 columns