Skip to contents

Group features based on similar abundances (i.e. feature values) across samples. Parameter subset allows to define a sub set of samples on which the similarity calculation should be performed. It might for example be better to exclude QC samples from the analysis because feature values are supposed to be constant in these samples.

The function first calculates a nxn similarity matrix with n being the number of features and subsequently groups features for which the similarity is higher than the user provided threshold. Parameter simFun allows to specify the function to calculate the pairwise similarities on the feature values (eventually transformed by the function specified with parameter transform). simFun defaults to a function that uses cor to calculate similarities between rows in object but any function that calculates similarities between rows and that returns a (symmetric) numeric similarity matrix can be used.

If object is a SummarizedExperiment(): if a column "feature_group" is found in SummarizedExperiment::colData() feature groups defined in that column are further sub-grouped with this method. See groupFeatures() for the general concept of this feature grouping.

Parameter groupFun allows to specify the function to group the features based on the similarity function. It defaults to groupSimilarityMatrix. See groupSimilarityMatrix() for details.

Additional settings for the groupFun and simFun functions can be passed to the parameter object with the ... in the AbundanceSimilarityParam constructor function. Other additional parameters specific for the type of object can be passed via ... in the groupFeatures call.

Usage

AbundanceSimilarityParam(
  threshold = 0.9,
  simFun = corRows,
  groupFun = groupSimilarityMatrix,
  subset = integer(),
  transform = identity,
  ...
)

# S4 method for matrix,AbundanceSimilarityParam
groupFeatures(object, param, ...)

# S4 method for SummarizedExperiment,AbundanceSimilarityParam
groupFeatures(object, param, i = 1L, ...)

Arguments

threshold

numeric(1) defining the (similarity) threshold to be used for the feature grouping. This parameter is passed to the groupFun function.

simFun

function to be used to calculate (pairwise) similarities (between rows). Defaults to simFun = corRows. See description or corRows() for more details.

groupFun

function to group features based on the calculated similarity matrix. Defaults to groupFun = groupSimilarityMatrix. See groupSimilarityMatrix() for details.

subset

integer or logical defining a subset of samples (at least 2) on which the similarity calculation should be performed. By default the calculation is performed on all samples.

transform

function to be used to transform feature abundances prior to the similarity calculation. Defaults to transform = identity. Alternatively, values could e.g. transformed into log2 scale with transform = log2.

...

for AbundanceSimilarityParam: optional parameters to be passed along to simFun and groupFun. For groupFeatures: optional parameters for the extraction/definition of the feature values from object.

object

object containing the feature abundances on which features should be grouped.

param

AbundanceSimilarityParam defining the settings for the grouping based on feature values.

i

for object being a SummarizedExperiment(): integer(1) or character(1) specifying either the index or name of the the assay in object that contains the feature values that should be used. Use assayNames() on object to list all available assays.

Value

for object being a SummarizedExperiment: a SummarizedExperiment

with the grouping results added to a column "feature_group" in the object's rowData. For object being a matrix: an integer of length equal to the number of rows with the group identifiers.

See also

groupFeatures() for the general concept of feature grouping.

featureGroups() for the function to extract defined feature groups from a SummarizedExperiment.

Other feature grouping methods: groupFeatures-similar-rtime

Author

Johannes Rainer

Examples


## Define a simple numeric matrix on which we want to group the rows
x <- rbind(
    c(12, 34, 231, 234, 9, 5, 7),
    c(900, 900, 800, 10, 12, 9, 4),
    c(25, 70, 400, 409, 15, 8, 4),
    c(12, 13, 14, 15, 16, 17, 18),
    c(14, 36, 240, 239, 12, 7, 8),
    c(100, 103, 80, 2, 3, 1, 1)
    )

## Group rows based on similarity calculated with Pearson's correlation
## on the actual data values (without transforming them).
res <- groupFeatures(x, AbundanceSimilarityParam())
res
#> [1] 1 2 1 3 1 2

## Use Spearman's rho to correlate rows of the log2 transformed x matrix
res <- groupFeatures(x, AbundanceSimilarityParam(method = "spearman",
    transform = log2))
res
#> [1] 2 1 2 3 2 1

## Perform the grouping on a SummarizedExperiment
library(SummarizedExperiment)
data(se)

## Group features based on log2 transformed feature values in the first
## assay of the SummarizedExperiment
res <- groupFeatures(se, param = AbundanceSimilarityParam(threshold = 0.7,
    transform = log2))

featureGroups(res)
#>   [1] "FG.001" "FG.017" "FG.001" "FG.002" "FG.041" "FG.003" "FG.037" "FG.021"
#>   [9] "FG.021" "FG.021" "FG.028" "FG.045" "FG.017" "FG.002" "FG.014" "FG.026"
#>  [17] "FG.048" "FG.027" "FG.008" "FG.007" "FG.008" "FG.017" "FG.002" "FG.033"
#>  [25] "FG.011" "FG.016" "FG.015" "FG.002" "FG.027" "FG.049" "FG.010" "FG.019"
#>  [33] "FG.006" "FG.010" "FG.006" "FG.018" "FG.004" "FG.004" "FG.026" "FG.013"
#>  [41] "FG.020" "FG.008" "FG.018" "FG.013" "FG.008" "FG.011" "FG.011" "FG.022"
#>  [49] "FG.047" "FG.004" "FG.006" "FG.003" "FG.004" "FG.004" "FG.043" "FG.016"
#>  [57] "FG.032" "FG.001" "FG.038" "FG.025" "FG.003" "FG.045" "FG.026" "FG.003"
#>  [65] "FG.050" "FG.041" "FG.015" "FG.022" "FG.008" "FG.002" "FG.011" "FG.013"
#>  [73] "FG.015" "FG.051" "FG.014" "FG.001" "FG.052" "FG.031" "FG.044" "FG.017"
#>  [81] "FG.047" "FG.019" "FG.017" "FG.025" "FG.046" "FG.035" "FG.053" "FG.012"
#>  [89] "FG.012" "FG.008" "FG.054" "FG.034" "FG.009" "FG.028" "FG.030" "FG.030"
#>  [97] "FG.002" "FG.010" "FG.031" "FG.028" "FG.039" "FG.024" "FG.055" "FG.044"
#> [105] "FG.004" "FG.006" "FG.016" "FG.029" "FG.044" "FG.035" "FG.006" "FG.020"
#> [113] "FG.024" "FG.008" "FG.027" "FG.004" "FG.005" "FG.004" "FG.014" "FG.010"
#> [121] "FG.004" "FG.004" "FG.010" "FG.014" "FG.042" "FG.039" "FG.056" "FG.029"
#> [129] "FG.036" "FG.008" "FG.004" "FG.005" "FG.022" "FG.010" "FG.003" "FG.020"
#> [137] "FG.020" "FG.004" "FG.003" "FG.042" "FG.010" "FG.023" "FG.025" "FG.006"
#> [145] "FG.057" "FG.010" "FG.022" "FG.014" "FG.058" "FG.023" "FG.020" "FG.046"
#> [153] "FG.037" "FG.021" "FG.008" "FG.010" "FG.023" "FG.013" "FG.013" "FG.024"
#> [161] "FG.032" "FG.010" "FG.005" "FG.020" "FG.010" "FG.020" "FG.022" "FG.013"
#> [169] "FG.010" "FG.010" "FG.020" "FG.040" "FG.023" "FG.043" "FG.034" "FG.038"
#> [177] "FG.007" "FG.007" "FG.025" "FG.022" "FG.022" "FG.022" "FG.017" "FG.010"
#> [185] "FG.010" "FG.010" "FG.010" "FG.029" "FG.010" "FG.059" "FG.024" "FG.030"
#> [193] "FG.010" "FG.019" "FG.020" "FG.020" "FG.026" "FG.022" "FG.010" "FG.020"
#> [201] "FG.020" "FG.040" "FG.013" "FG.027" "FG.060" "FG.027" "FG.036" "FG.020"
#> [209] "FG.013" "FG.022" "FG.022" "FG.005" "FG.010" "FG.019" "FG.029" "FG.017"
#> [217] "FG.002" "FG.033" "FG.009" "FG.020" "FG.020" "FG.018" "FG.009" "FG.020"
#> [225] "FG.020"

## Perform feature grouping only on a subset of rows/features:
featureGroups(res) <- NA_character_
featureGroups(res)[40:80] <- "FG"
res <- groupFeatures(res, AbundanceSimilarityParam(transform = log2))
featureGroups(res)
#>   [1] NA       NA       NA       NA       NA       NA       NA       NA      
#>   [9] NA       NA       NA       NA       NA       NA       NA       NA      
#>  [17] NA       NA       NA       NA       NA       NA       NA       NA      
#>  [25] NA       NA       NA       NA       NA       NA       NA       NA      
#>  [33] NA       NA       NA       NA       NA       NA       NA       "FG.001"
#>  [41] "FG.002" "FG.003" "FG.008" "FG.009" "FG.006" "FG.007" "FG.006" "FG.001"
#>  [49] "FG.010" "FG.004" "FG.005" "FG.005" "FG.004" "FG.004" "FG.011" "FG.012"
#>  [57] "FG.013" "FG.001" "FG.014" "FG.015" "FG.003" "FG.016" "FG.017" "FG.003"
#>  [65] "FG.018" "FG.019" "FG.020" "FG.021" "FG.007" "FG.002" "FG.022" "FG.023"
#>  [73] "FG.024" "FG.025" "FG.026" "FG.027" "FG.028" "FG.029" "FG.030" "FG.031"
#>  [81] NA       NA       NA       NA       NA       NA       NA       NA      
#>  [89] NA       NA       NA       NA       NA       NA       NA       NA      
#>  [97] NA       NA       NA       NA       NA       NA       NA       NA      
#> [105] NA       NA       NA       NA       NA       NA       NA       NA      
#> [113] NA       NA       NA       NA       NA       NA       NA       NA      
#> [121] NA       NA       NA       NA       NA       NA       NA       NA      
#> [129] NA       NA       NA       NA       NA       NA       NA       NA      
#> [137] NA       NA       NA       NA       NA       NA       NA       NA      
#> [145] NA       NA       NA       NA       NA       NA       NA       NA      
#> [153] NA       NA       NA       NA       NA       NA       NA       NA      
#> [161] NA       NA       NA       NA       NA       NA       NA       NA      
#> [169] NA       NA       NA       NA       NA       NA       NA       NA      
#> [177] NA       NA       NA       NA       NA       NA       NA       NA      
#> [185] NA       NA       NA       NA       NA       NA       NA       NA      
#> [193] NA       NA       NA       NA       NA       NA       NA       NA      
#> [201] NA       NA       NA       NA       NA       NA       NA       NA      
#> [209] NA       NA       NA       NA       NA       NA       NA       NA      
#> [217] NA       NA       NA       NA       NA       NA       NA       NA      
#> [225] NA