conform-ing, a GeneSetDb to a target expression object is an important step required prior to perform any type of GSEA. This function maps the featureIds used in the GeneSetDb to the elements of a target expression object (ie. the rows of an expression matrix, or the elements of a vector of gene-level statistics).

After conform-ation, each geneset in the GeneSetDb is flagged as active (or inactive) given the number of its features that are successfully mapped to target and the minimum and maximum number of genes per geneset required as specified by the min.gs.size and max.gs.size parameters, respectively.

Only genesets that are marked with active = TRUE will be used in any downstream gene set operations.

conform(x, ...)

unconform(x, ...)

# S4 method for GeneSetDb
conform(
  x,
  target,
  unique.by = c("none", "mean", "var"),
  min.gs.size = 2L,
  max.gs.size = Inf,
  match.tolerance = 0.25,
  ...
)

# S4 method for GeneSetDb
unconform(x, ...)

is.conformed(x, to)

Arguments

x

The GeneSetDb

...

moar args

target

The expression object/matrix to conform to. This could also just be a character vector of IDs.

unique.by

If there are multiple rows that map to the identifiers used in the genesets, this is a means to pick the single row for that ID

min.gs.size

Ensure that the genesets that make their way to the GeneSetDb@table are of a minimum size

max.gs.size

Ensure that the genesets that make their way to the GeneSetDb@table are smaller than this size

match.tolerance

Numeric value between [0,1]. If the fraction of feature_ids used in x that match rownames(y) is below this number, a warning will be fired.

to

the object to test conformation to

Value

A GeneSetDb() that has been matched/conformed to an expression object target y.

Functions

  • is.conformed(): Checks to see if GeneSetDb x is conformed to a target object to

Examples

es <- exampleExpressionSet()
gdb <- exampleGeneSetDb()
head(geneSets(gdb))
#>   collection                                          name active  N  n
#> 1         c2                        BIOCARTA_AGPCR_PATHWAY  FALSE 13 NA
#> 2         c2         BOYAULT_LIVER_CANCER_SUBCLASS_G123_DN  FALSE 51 NA
#> 3         c2               BURTON_ADIPOGENESIS_PEAK_AT_2HR  FALSE 51 NA
#> 4         c2        BYSTRYKH_HEMATOPOIESIS_STEM_CELL_IL3RA  FALSE  9 NA
#> 5         c2             CAIRO_PML_TARGETS_BOUND_BY_MYC_UP  FALSE 23 NA
#> 6         c2 CHARAFE_BREAST_CANCER_BASAL_VS_MESENCHYMAL_DN  FALSE 50 NA
gdb <- conform(gdb, es)
## Note the updated values `active` flag, and n (the number of features
## mapped per gene set)
head(geneSets(gdb))
#>   collection                                          name active  N  n
#> 1         c2                        BIOCARTA_AGPCR_PATHWAY   TRUE 13 11
#> 2         c2         BOYAULT_LIVER_CANCER_SUBCLASS_G123_DN   TRUE 51 41
#> 3         c2               BURTON_ADIPOGENESIS_PEAK_AT_2HR   TRUE 51 50
#> 4         c2        BYSTRYKH_HEMATOPOIESIS_STEM_CELL_IL3RA   TRUE  9  6
#> 5         c2             CAIRO_PML_TARGETS_BOUND_BY_MYC_UP   TRUE 23 23
#> 6         c2 CHARAFE_BREAST_CANCER_BASAL_VS_MESENCHYMAL_DN   TRUE 50 45