conform-ing, a GeneSetDb to a target expression object is an important step required prior to perform any type of GSEA. This function maps the featureIds used in the GeneSetDb to the elements of a target expression object (ie. the rows of an expression matrix, or the elements of a vector of gene-level statistics).

After conform-ation, each geneset in the GeneSetDb is flagged as active (or inactive) given the number of its features that are successfully mapped to target and the minimum and maximum number of genes per geneset required as specified by the min.gs.size and max.gs.size parameters, respectively.

Only genesets that are marked with active = TRUE will be used in any downstream gene set operations.

conform(x, ...)

unconform(x, ...)

# S4 method for GeneSetDb
conform(
  x,
  target,
  unique.by = c("none", "mean", "var"),
  min.gs.size = 2L,
  max.gs.size = Inf,
  match.tolerance = 0.25,
  ...
)

# S4 method for GeneSetDb
unconform(x, ...)

is.conformed(x, to)

Arguments

x

The GeneSetDb

...

moar args

target

The expression object/matrix to conform to. This could also just be a character vector of IDs.

unique.by

If there are multiple rows that map to the identifiers used in the genesets, this is a means to pick the single row for that ID

min.gs.size

Ensure that the genesets that make their way to the GeneSetDb@table are of a minimum size

max.gs.size

Ensure that the genesets that make their way to the GeneSetDb@table are smaller than this size

match.tolerance

Numeric value between [0,1]. If the fraction of feature_ids used in x that match rownames(y) is below this number, a warning will be fired.

Value

A GeneSetDb() that has been matched/conformed to an expression object target y.

  • unconform(): Resets the conformation mapping.

  • is.conformed(): If to is missing, looks for evidence that conform has been called (at all) on x. If to is provided, specifically checks that x has been conformed to the target object to.

Examples

#> collection name active N n #> 1 c2 BIOCARTA_AGPCR_PATHWAY FALSE 13 NA #> 2 c2 BOYAULT_LIVER_CANCER_SUBCLASS_G123_DN FALSE 51 NA #> 3 c2 BURTON_ADIPOGENESIS_PEAK_AT_2HR FALSE 51 NA #> 4 c2 BYSTRYKH_HEMATOPOIESIS_STEM_CELL_IL3RA FALSE 9 NA #> 5 c2 CAIRO_PML_TARGETS_BOUND_BY_MYC_UP FALSE 23 NA #> 6 c2 CHARAFE_BREAST_CANCER_BASAL_VS_MESENCHYMAL_DN FALSE 50 NA
gdb <- conform(gdb, es) ## Note the updated values `active` flag, and n (the number of features ## mapped per gene set) head(geneSets(gdb))
#> collection name active N n #> 1 c2 BIOCARTA_AGPCR_PATHWAY TRUE 13 11 #> 2 c2 BOYAULT_LIVER_CANCER_SUBCLASS_G123_DN TRUE 51 41 #> 3 c2 BURTON_ADIPOGENESIS_PEAK_AT_2HR TRUE 51 50 #> 4 c2 BYSTRYKH_HEMATOPOIESIS_STEM_CELL_IL3RA TRUE 9 6 #> 5 c2 CAIRO_PML_TARGETS_BOUND_BY_MYC_UP TRUE 23 23 #> 6 c2 CHARAFE_BREAST_CANCER_BASAL_VS_MESENCHYMAL_DN TRUE 50 45